SMARTbiomed summer school 2026

 

Statistical Genetics

Overview

This module provides an introduction to the statistical foundations and applied methodologies used to study the genetics of quantitative and complex traits in human populations. The emphasis is on developing a deep conceptual understanding of the main modelling frameworks, inference procedures, and computational tools used in modern statistical genetics. Students will engage with both theoretical material and hands-on computational exercises, with a particular focus on estimation, hypothesis testing, and prediction using genetic markers.

Prerequisites 

Basic proficiency in R and bash scripting.

Familiarity with introductory statistics (regression, probability distributions) is helpful but not strictly required.

Learning Objectives

 

  • Gain an overview of heritability and generative genetic models, considering family data, linkage disequilibrium, indirect genetic effects, and assortative mating.
  • Understand regression theory, covariance, and its application to pleiotropy and genetic correlation.
  • Understand genetic association testing theory, power and fine-mapping of genetic effects. Proposed modelling frameworks will be covered, including AI-REML and high-dimensional regression in both Bayesian and frequentist frameworks.
  • Develop an understanding of statistical models for disease traits, including liability threshold models for binary outcomes, survival models for time-to-event data, the role of truncated normal theory in case–control sampling, and the principles of association testing for disease phenotypes.
  • Understand the principles and pitfalls of prediction analyses using genetic markers within and across populations.

Learning Outcomes

By the end of the course, students will be able to:

  • Explain the statistical principles underlying modern genetic analyses.
  • Translate conceptual models into algorithms for inference.
  • Critically evaluate and apply commonly used software for heritability, association testing, fine-mapping, and genetic prediction.
  • Apply methods for both common and rare variant association.
  • Implement basic analysis pipelines in R, Python, or C++ (with most demonstrations provided in R).

General Skills across the course

An ability to read/execute code in R/python/C++. Most demonstrations will be kept simple and done in R. For each objective a practical will be given where students will be able to work with simple demonstration versions of existing approaches.

Required Software

R, a Unix-based computing environment for running command-line tools, and docker, with containers will be provided to ensure reproducibility and ease of installation.

Teachers



Professor Matthew Robinson, Institute of Science and Technology, Austria

Matthew Robinson’s group focuses on medical genomics and large-scale modelling of human health data. His research aims to understand how genetic and lifestyle factors combine to influence the risk, onset, and progression of common complex diseases. Matt completed his PhD at the University of Edinburgh and held research positions in Australia and Switzerland before joining ISTA in 2020. His group develops statistical and computational methods for analysing biobank-scale datasets, characterising genetic architecture, and predicting health outcomes across the life course.



Dr Duncan Palmer, Department of Statistics, University of Oxford

Duncan Palmer is a SMARTbiomed Senior Research Fellow at the Big Data Institute and the Department of Statistics. He co-leads the Biobank Rare-Variant Consortium (BRaVa), an international collaboration analysing large-scale sequencing data to uncover the genetic basis of complex traits. Duncan’s research focuses on developing statistical and computational methods that integrate genetic, phenotypic, and functional data to refine association signals and understand disease mechanisms.