Easy-to-use, efficient, flexible and scalable tools for analyzing massive SNP arrays. Privé et al. (2018) doi:10.1093/bioinformatics/bty185.
Easy-to-use, efficient, flexible and scalable statistical tools. Package bigstatsr provides and uses Filebacked Big Matrices via memory-mapping. It provides for instance matrix operations, Principal Component Analysis, sparse linear supervised models, utility functions and more doi:10.1093/bioinformatics/bty185.
When causal quantities are not identifiable from the observed data, it still may be possible to bound these quantities using the observed data. We outline a class of problems for which the derivation of tight bounds is always a linear programming problem and can therefore, at least theoretically, be solved using a symbolic linear optimizer. We extend and generalize the approach of Balke and Pearl (1994) doi:10.1016/B978-1-55860-332-5.50011-0 and we provide a user friendly graphical interface for setting up such problems via directed acyclic graphs (DAG), which only allow for problems within this class to be depicted. The user can then define linear constraints to further refine their assumptions to meet their specific problem, and then specify a causal query using a text interface. The program converts this user defined DAG, query, and constraints, and returns tight bounds. The bounds can be converted to R functions to evaluate them for specific datasets, and to latex code for publication. The methods and proofs of tightness and validity of the bounds are described in a paper by Sachs, Jonzon, Gabriel, and Sjölander (2022) doi:10.1080/10618600.2022.2071905.
Various tools for inferring causal models from observational data. The package includes an implementation of the temporal Peter-Clark (TPC) algorithm. Petersen, Osler and Ekstrøm (2021) doi:10.1093/aje/kwab087. It also includes general tools for evaluating differences in adjacency matrices, which can be used for evaluating performance of causal discovery procedures.
gact provides a flexible infrastructure for managing and analyzing large-scale genomic association data. It supports building association databases, processing biological resources, and handling GWAS summary statistics. gact links genetic markers to genes, proteins, metabolites, and pathways, and integrates seam-lessly with qgg to enable reproducible, extensible workflows for integrative genomics.
GCTB is a versatile command-line software suite for complex trait analysis using genome-wide SNP data. It implements a family of Bayesian mixture models that jointly fit all SNP effects and supports both individual-level genotype and phenotype data as well as GWAS summary statistics. The suite includes:
GCTB provides a comprehensive framework for understanding genetic architecture and improving prediction accuracy in complex trait studies.
Manc-COJO is a C++ software tool for multi-ancestry conditional and joint analysis (COJO) of GWAS summary statistics.
GWAS tests for association between a trait and SNPs one at a time, giving marginal SNP effect estimates. However, associations detected in GWAS are often not independent because of linkage disequilibrium (LD) between SNPs. To address this challenge, COJO has been proposed and widely used for single-ancestry analyses, where it identifies independent association signals through iterative conditioning on significant SNPs while jointly modelling their effects to account for LD. Building upon COJO, our multi-ancestry extension exploits population-specific LD differences to improve the detection of independent association signals and reduce false positives compared to single-ancestry COJO (and ad hoc adaptations for multi-ancestry use).
Note that Manc-COJO can also perform single-ancestry COJO and reproduce the results of GCTA-COJO, while running substantially faster.
Theme: Risk prediction, statistical genetics
Language: C++, command line
qgg provides advanced tools for statistical modeling and analysis of large-scale genomic data. It implements Bayesian Linear Regression (BLR) models for fine-mapping, polygenic scoring, and gene set enrichment analysis, combining efficient algorithms with high-performance computing for scalable quantitative genomics.
SMARTbiomed researchers will develop methods and software for analysing and interpreting health data, including causal inference (led by Erin Gabriel, University of Copenhagen), risk prediction (led by Bjarni Vilhjálmsson, Aarhus University), and machine learning (led by Chris Holmes, University of Oxford). SMARTbiomed will advance medical research, focusing on common complex diseases and disorders, especially cardiometabolic (diabetes, and cardiovascular diseases), brain (psychiatric and neurological), and reproductive (endometriosis, involuntary infertility) conditions.