SMARTbiomed summer school: “Causal inference, statistical genetics, and machine learning in common disease epidemiology and biology”

Module: R Package Development

Module Summary

R is a free, open-source statistical computing environment that is the language of choice for correct and reproducible analysis. R packages are the fundamental unit of reproducible R code for analysis and reporting. A good R package can have a big impact in scientific research, whether it is an implementation of a novel statistical method or an interface to existing analytic approaches. Researchers in the fields of molecular, genetic, and clinical epidemiology would benefit from more and better implementations of statistical and epidemiological methods in R.

Aside from the mechanics of packaging R code and basic principles of software development, this module will focus on principles for development of high quality packages and maximizing their impact. Through a series of examples from existing R packages, participants will learn about different strategies for designing and implementing interfaces to statistical and epidemiological methods. Then, we will summarize the steps one can take to maximize the impact of the R package and to obtain academic credit for one’s efforts.

Prerequisites

Basic to intermediate skills in R programming: using and writing functions, loops, conditional execution
Familiarity with Rstudio and R management: installing packages, managing workspaces and directories.
It is beneficial, but not necessary, if participants have at least a general idea for an R package that they wish to create or a method that they think should be implemented or improved.

Module Content

The mechanics of packaging R code using devtools
Modularity and the DRY principle, testing and documentation, version control
Interfaces, covering functions, operators, S3 classes and overloading operators, the pipe operator, and other types of classes (S4, RC, R6).
Releasing R packages, covering Github, web pages, CRAN, Bioconductor, and publishing clinical or software papers describing the package.

Required Software

R version >= 4.4.0
R package build tools. Windows users will need to install Rtools (https://cran.r-project.org/bin/windows/Rtools/)
R packages: devtools, usethis, Roxygen2, testthat
Rstudio

Teachers

Michael Sachs, Associate Professor, Section of Biostatistics, University of Copenhagen

Michael Sachs is an Associate Professor at the Section of Biostatistics at the University of Copenhagen, and has an affiliation at the Karolinska Institute. He has a PhD degree in biostatistics from the University of Washington, Seattle, WA. He has worked as an applied statistician in a variety of medical areas including, cancer treatment and diagnosis, inflammatory diseases, Alzheimer’s disease, and nephrology. He is an avid R user and developer, with a passion for open science, data visualization, and reproducible research. He is the author and maintainer of the R packages causaloptim, plotROC (a ggplot2 extension), eventglm, stdReg2, and more. His personal research interests are the development and evaluation of risk prediction models and biomarkers, assay development and validation, statistical computing, causal inference in observational studies, and tools for reproducible research.

Revised 07.02.2025

SMARTbiomed