Imputation and Fine-Mapping in Genetic Association Studies
Yu, Ketian
2023
Abstract
Increasing availability of large whole genome sequencing and genomics data have brought both opportunities and challenges in genetic research. Genotype imputation is an integral tool in genome-wide association studies, where it facilitates meta-analysis, increases power and enables fine-mapping. With access to a multitude of reference panel choices for genotype imputation, investigators start to explore ways of utilizing information from different panels for better accuracy. The successive increase in sample size and genotype density in sequencing projects also enables high-resolution fine mapping, which improves the understanding of the underlying mechanisms of complex diseases. However, there is a reasonable chance that the lead variants from fine-mapping are not causal but are detected simply due to linkage disequilibrium (LD) with true causal variants, so caution is required when interpreting the association signals. In this dissertation, we present improved methods for genotype imputation and gene expression imputation, explore challenges in fine-mapping that result from complex LD structure and provide potential remedies. In Chapter 2, we described an efficient meta-imputation framework that enables researchers to merge imputed data generated from multiple reference panels without the need to access individual-level genotype data for the underlying reference samples. We first impute against different reference panels separately using our minimac4 imputation software with a new built-in leave-one-out (LOO) imputation feature, and then combine the imputed results into a consensus dataset using weights that are tailored to each individual and genome segment. The weights are dynamically estimated through a hidden Markov model utilizing individual-specific LOO results. In the scenarios we examined, meta-imputation consistently outperforms imputation using a single reference panel and achieves comparable accuracy to imputation using a combined reference panel. In Chapter 3, we presented a comprehensive exploration of the trade-offs associated with statistical fine-mapping strategies. We particularly focused on the impacts of the choice of data type (summary statistics versus individual-level data) and the algorithmic approach (greedy versus multiple starting-point strategy). Our evaluations revealed that using summary statistics typically resulted in decreased power and coverage in fine-mapping. We also highlighted the issues of non-identifiability in the presence of complex LD structures, a scenario where a greedy search strategy might overlook alternate model configurations, leading to false discoveries. To address this, we proposed a multiple starting-point strategy to improve the calibration of posterior probabilities, albeit at an increased computational cost. In Chapter 4, we systematically compared models for gene expression imputation based on TOPMed RNAseq data, and revealed a positive correlation between imputation accuracy and both reference sample size and degree of ancestry matching between reference and target samples. The study demonstrates that a large, diverse reference panel can achieve accuracy comparable to that of a smaller, ancestry-specific panel. This finding obviates the need to classify target samples into ancestry groups and carry out imputations using the corresponding ancestry-matching subpanels, thereby enhancing processing efficiency. Moreover, we have crafted gene expression imputation models based on DAP-G, leveraging TOPMed RNAseq data, to support transcriptome-wide association studies. This feature will soon be integrated into the TOPMed imputation server, creating a unified platform where users can access both imputed gene expressions and genotypes.Deep Blue DOI
Subjects
genome-wide association study genotype imputation fine-mapping gene expression imputation
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.