Statistical Methods and Computational Tools for Genetics and Genomics Data
Shang, Lulu
2023
Abstract
Recent advances in array-based and sequencing-based technologies have enabled genome-wide profiling of gene expression and various epigenetic markers. Extracting valuable biological information from different omics data types requires the development of new computational and statistical methods. My dissertation centers around developing statistical methods and analyzing diverse omics data types. In this dissertation, we propose several effective and efficient statistical and computational methods to address critical biological problems encountered in distinct genomics fields including spatial transcriptomics, single cell, and bulk RNA-seq studies. In addition, we have conducted two large-scale comprehensive quantitative trait loci (QTL) mapping studies in African Americans from the GENOA study, to carefully examine how inherited genetic variation affects local gene expression and DNA methylation in a population that is under-represented in genetic studies. In Chapter II, we focus on data collected from various spatial transcriptomic technologies and developed a method called SpatialPCA for spatially aware dimension reduction in spatial transcriptomics. SpatialPCA builds upon the probabilistic version of principal component analysis (PCA), incorporates additional localization information as input and explicitly models the spatial correlation structure across tissue locations using a kernel matrix. SpatialPCA extracts a low-dimensional representation of the spatial transcriptomics data with enriched biological signals and preserved spatial correlation structure in spatial transcriptomics. We demonstrate the advantages of SpatialPCA through spatial transcriptomics visualization, spatial domain detection, spatial trajectory inference on the tissue, and high-resolution spatial map reconstruction. In Chapter III, we connect genome-wide association studies (GWAS) with single cell and bulk RNA-seq data and develop a method called CoCoNet (COmposite likelihood-based COvariance regression NETwork model). CoCoNet utilizes tissue-specific gene co-expression networks to infer trait-relevant tissues by integrating GWAS and gene expression studies. CoCoNet utilizes a covariance regression network model to express gene-level effect sizes for a given GWAS trait as a function of the tissue-specific co-expression adjacency matrix. Our findings also provide empirical support for one of the omnigenic model's hypotheses, which states that the trait-relevant gene co-expression networks may underlie the etiology of diseases. In Chapter IV, we conducted a large-scale cis-eQTL mapping studies on 1,032 African Americans and 801 European Americans to link genetic variants with gene expression. Our results suggest that distinct genetic architectures underlie the expression variation between the two populations. The large sample sizes in GENOA allow us to construct accurate expression prediction models in both populations, facilitating powerful transcriptome-wide association studies. The availability of samples from both populations also allows us to conduct a comparative study on the genetic regulation of gene expression across populations. Our results represent an important step toward revealing the genetic architecture underlying expression variation in African Americans. In Chapter V, we conducted a large-scale cis-meQTL mapping study on 961 African Americans to link genetic variants with DNA methylation. We identified a large number of cis-meQTLs and found that a substantial fraction of meCpGs harbor multiple independent meQTLs, suggesting potential polygenic genetic architecture underlying methylation variation. A large percentage of the cis-meQTLs also colocalize with cis-eQTLs in the same population. Importantly, the identified cis-meQTLs explain a substantial proportion of methylation variation and the cis-meQTL associated CpG sites mediate a substantial proportion of SNP effects underlying gene expression. Overall, our results represent an important step toward revealing the co-regulation of methylation and gene expression, facilitating the functional interpretation of epigenetic and gene regulation underlying common diseases in African Americans.Deep Blue DOI
Subjects
Statistical Methods Computational Tools Genetics and genomics
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.