SMARTbiomed summer school: “Causal inference, statistical genetics, and machine learning in common disease epidemiology and biology”

Module: Introduction to statistical genetics I

Module summary

This module serves as an introduction to the foundational concepts and methods in statistical genetics, providing participants with practical experience in handling genetic data and performing key analyses. Over the course of two days, participants will explore genetic data formats, learn preprocessing techniques, and gain an understanding of how to analyze population structure using principal component analysis (PCA). Additionally, they will conduct a genome-wide association study (GWAS) to identify genetic variants associated with phenotypes. By the end of this module, participants will have a working knowledge of statistical genetics methods and hands-on experience with R-based tools, including the R package bigsnpr.

Prerequisites

A basic knowledge in R.

Module content

Familiarize with different genetic data formats and some preprocessing.
Use principal component analysis (PCA) to capture population structure.
Perform a genome-wide association study (GWAS).

Required Software

R + RStudio + CRAN package bigsnpr (to be downloaded and installed prior to the start of the course).

Module: Introduction to statistical genetics II: Statistical genetics methods using GWAS summary statistics

Module summary

This module builds upon the foundational knowledge from "Introduction to Statistical Genetics I", focusing on advanced statistical genetics methods using GWAS summary statistics. Over two days, participants will explore key methodologies such as polygenic score (PGS) construction, fine-mapping approaches to identify causal variants, and derivation of linkage disequilibrium (LD) matrices. Additionally, the module will introduce methods for inferring ancestry from allele frequency data. Participants will gain hands-on experience with R-based tools, including the R packages bigsnpr and susieR, to perform these analyses on genetic datasets.

Prerequisites

A basic knowledge in R and the outcomes from the module “Introduction to statistical genetics”, especially what is a GWAS and the resulting GWAS summary statistics.

Module content

Understand and implement polygenic score (PGS) methods.
Apply fine-mapping techniques to identify causal genetic variants.
Derive LD matrices and utilize them for downstream analyses.
Infer population ancestry from allele frequency data.

Required Software

R + RStudio + CRAN packages bigsnpr and susieR (to be downloaded and installed prior to the start of the course)

Teacher

Florian Privé, Senior Researcher at Aarhus University, Denmark

Dr. Florian Privé specializes in the development and application of statistical learning methods to advance precision medicine. With an interdisciplinary background in Statistics and Computer Science, he focuses on designing tools for the analysis of large-scale datasets and developing statistical approaches for constructing predictive models from extensive genetic data. He is particularly recognized for his contributions to polygenic scores (PGS), notably through the creation of LDpred2, one of the most widely adopted methods for PGS construction. Dr. Privé's expertise also extends to principal component analysis (PCA) and genetic ancestry inference, where his work has had a significant impact. In addition to his methodological advancements, Dr. Privé has implemented highly efficient R(cpp) packages, such as bigstatsr and bigsnpr, which facilitate scalable analyses of large omic datasets and are widely used in the field.

Revised 07.03.2025

SMARTbiomed