Statistical Approaches for the Integrative Analysis of Multi-omics Data

Hukku, Abhay

Statistical Approaches for the Integrative Analysis of Multi-omics Data

dc.contributor.author	Hukku, Abhay
dc.date.accessioned	2022-01-19T15:26:17Z
dc.date.available	2022-01-19T15:26:17Z
dc.date.issued	2021
dc.date.submitted	2021
dc.identifier.uri	https://hdl.handle.net/2027.42/171398
dc.description.abstract	In recent years, large-scale studies have been conducted to investigate the genetic architecture underlying molecular and complex traits. As a result, integrative genetic association analysis has emerged as a tool to link functional genomic units (e.g. proteins, metabolites, RNA) with complex diseases. In this dissertation, we develop integrative analysis methodologies to analyze multi-omics datasets, and we additionally make statistical connections between different types of integrative genetic analysis methods. In the first project, we develop a novel approach to conduct gene set enrichment analysis. Gene set enrichment analysis has been shown to be effective in identifying relevant biological pathways underlying complex diseases. We propose a novel computational method, Bayesian Analysis of Gene Set Enrichment (BAGSE), for gene set enrichment analysis. Through simulation studies, we illustrate that BAGSE yields accurate enrichment quantification while achieving similar power as the state-of-the-art methods. Further simulation studies show that BAGSE can effectively utilize the enrichment information to improve the power in gene discovery. Finally, we demonstrate the application of BAGSE in analyzing real data from a differential expression experiment and a Transcriptome-wide Association Study (TWAS). Our results indicate that the proposed statistical framework is effective in aiding the discovery of potentially causal pathways and gene networks. In the second project, we conduct an in-depth investigation of the promise and limitations of available colocalization approaches. Colocalization analysis has emerged as a tool to uncover overlapping genetic variants contributing simultaneously to both molecular and complex disease phenotypes. We examine the impact of various controllable analytical factors and uncontrollable practical factors on outcomes of colocalization analysis through realistic simulations and real data examples. Based on our investigations, we recommend the following strategies for the best practice of colocalization analysis: i) estimating prior enrichment level from the observed data; and ii) separating fine-mapping and colocalization analysis. Our analysis of real data suggests that colocalizations of molecular QTLs and complex trait associations are widespread, but are often undetected due to a lack of power. Our findings set a benchmark for current and future integrative genetic association analysis applications. In the third project, we establish a unified framework for widely used integrative genetic association analysis techniques. Colocalization analysis and TWAS are both popular approaches that are used to link molecular and complex traits. Although both methods have been utilized to implicate potential causal genes for complex phenotypes, their inference results are substantially different in practice, even when applied to identical input datasets. We start by discovering biological and statistical factors for these discrepancies. In order to reconcile the two types of approaches, we introduce locus-level colocalization. Locus-level colocalization aims to identify genomic regions, marked by high LD, that contain causal genetic variants for both investigated traits. We use simulations to show that locus-level colocalization makes a substantial improvement on the number of discoveries in comparison to SNP-level colocalization. Furthermore, we provide a framework for utilizing locus-level colocalization as a method to filter the results from TWAS to the most biologically relevant genes. Based on our results, locus-level colocalization has the potential to be integrated with TWAS to more precisely and accurately identify underlying causal genes for complex traits.
dc.language.iso	en_US
dc.subject	Statistical genetics
dc.title	Statistical Approaches for the Integrative Analysis of Multi-omics Data
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Biostatistics
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Wen, Xiaoquan William
dc.contributor.committeemember	Sartor, Maureen
dc.contributor.committeemember	Morrison, Jean Victoria
dc.contributor.committeemember	Mukherjee, Bhramar
dc.subject.hlbsecondlevel	Statistics and Numeric Data
dc.subject.hlbtoplevel	Science
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/171398/1/abhukku_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/3910
dc.identifier.orcid	0000-0002-3375-6299
dc.working.doi	10.7302/3910	en
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: abhukku_1.pdf
Size:: 5.610MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.