Statistical Approaches for the Integrative Analysis of Multi-omics Data

Hukku, Abhay

Statistical Approaches for the Integrative Analysis of Multi-omics Data

Hukku, Abhay

2021

View/Open

abhukku_1.pdf

(5.6MB

PDF)

Abstract

In recent years, large-scale studies have been conducted to investigate the genetic architecture underlying molecular and complex traits. As a result, integrative genetic association analysis has emerged as a tool to link functional genomic units (e.g. proteins, metabolites, RNA) with complex diseases. In this dissertation, we develop integrative analysis methodologies to analyze multi-omics datasets, and we additionally make statistical connections between different types of integrative genetic analysis methods. In the first project, we develop a novel approach to conduct gene set enrichment analysis. Gene set enrichment analysis has been shown to be effective in identifying relevant biological pathways underlying complex diseases. We propose a novel computational method, Bayesian Analysis of Gene Set Enrichment (BAGSE), for gene set enrichment analysis. Through simulation studies, we illustrate that BAGSE yields accurate enrichment quantification while achieving similar power as the state-of-the-art methods. Further simulation studies show that BAGSE can effectively utilize the enrichment information to improve the power in gene discovery. Finally, we demonstrate the application of BAGSE in analyzing real data from a differential expression experiment and a Transcriptome-wide Association Study (TWAS). Our results indicate that the proposed statistical framework is effective in aiding the discovery of potentially causal pathways and gene networks. In the second project, we conduct an in-depth investigation of the promise and limitations of available colocalization approaches. Colocalization analysis has emerged as a tool to uncover overlapping genetic variants contributing simultaneously to both molecular and complex disease phenotypes. We examine the impact of various controllable analytical factors and uncontrollable practical factors on outcomes of colocalization analysis through realistic simulations and real data examples. Based on our investigations, we recommend the following strategies for the best practice of colocalization analysis: i) estimating prior enrichment level from the observed data; and ii) separating fine-mapping and colocalization analysis. Our analysis of real data suggests that colocalizations of molecular QTLs and complex trait associations are widespread, but are often undetected due to a lack of power. Our findings set a benchmark for current and future integrative genetic association analysis applications. In the third project, we establish a unified framework for widely used integrative genetic association analysis techniques. Colocalization analysis and TWAS are both popular approaches that are used to link molecular and complex traits. Although both methods have been utilized to implicate potential causal genes for complex phenotypes, their inference results are substantially different in practice, even when applied to identical input datasets. We start by discovering biological and statistical factors for these discrepancies. In order to reconcile the two types of approaches, we introduce locus-level colocalization. Locus-level colocalization aims to identify genomic regions, marked by high LD, that contain causal genetic variants for both investigated traits. We use simulations to show that locus-level colocalization makes a substantial improvement on the number of discoveries in comparison to SNP-level colocalization. Furthermore, we provide a framework for utilizing locus-level colocalization as a method to filter the results from TWAS to the most biologically relevant genes. Based on our results, locus-level colocalization has the potential to be integrated with TWAS to more precisely and accurately identify underlying causal genes for complex traits.

Deep Blue DOI

https://dx.doi.org/10.7302/3910

Subjects

Statistical genetics

Types

Thesis

Handle

https://hdl.handle.net/2027.42/171398

Metadata

Show full item record

Collections

Dissertations and Theses (Ph.D. and Master's)

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.