Show simple item record

Integrative Analyses of Genetic and Genomic Sequence Data to Improve GWAS Interpretation

dc.contributor.authorHanks, Sarah
dc.date.accessioned2024-02-13T21:20:46Z
dc.date.available2024-02-13T21:20:46Z
dc.date.issued2023
dc.date.submitted2023
dc.identifier.urihttps://hdl.handle.net/2027.42/192438
dc.description.abstractGenome-wide association studies (GWAS) to date have identified hundreds of thousands of genetic variants associated with tens of thousands of complex human diseases and traits. However, fully understanding the biological mechanisms of these associations remains challenging and requires more complete identification of causal genetic variants and their functional consequences in human cells and tissues. Here, we propose novel methods and approaches for integrative analyses of genetic and genomic sequence data to further this understanding. In the first project, we quantify the extent to which array genotyping and imputation can approximate deep whole genome sequencing (WGS) across a range of ancestries, reference panels, and genotyping arrays. Deep WGS, the gold standard technology for genetic variant identification and genotyping, remains very expensive for most large studies. In this chapter, we use WGS data from studies of individuals of African, Hispanic/Latino, and European ancestry in the US, and of Finnish ancestry in Finland (a population isolate) and perform genotype imputation using the genetic variants present on the Illumina Core, OmniExpress, MEGA, and Omni 2.5M arrays with the 1000G, HRC, and TOPMed imputation reference panels. We find that using the Omni 2.5M array and the TOPMed panel, ≥90% of biallelic single nucleotide variants (SNVs) are well-imputed (r2>0.8) down to minor allele frequencies (MAF) of 0.14% in African, 0.11% in Hispanic/Latino, 0.35% in European, and 0.85% in Finnish ancestries. We find that individual-level imputation quality varies widely between and within the three US populations. Imputation quality also varies across genomic regions, producing regions where even common (MAF>5%) variants were not consistently well-imputed across ancestries. In the second project, we investigate the consequences of violating the independent-cohorts assumption of genetic colocalization methods. Colocalization analysis aims to identify genetic variants that are causal for multiple association signals at a single locus. Existing colocalization methods explicitly assume that the phenotypes are measured in independent, non-overlapping samples. In this chapter, we present simulation analyses that demonstrate the consequences of applying these methods in a single cohort. We show that Type I error is well-controlled when the ratio of shared to trait-specific error variance is low but becomes problematic with increased sharing. For scenarios with well-controlled Type I error, we show that the one-sample design is more powerful than the two-sample design due to better linkage disequilibrium matching. Power can be further improved in the one-sample design when shared non-genetic factors are measured and controlled for in the marginal association analyses. In the third project, we examine sex differences in gene expression and regulation in human skeletal muscle at the single nucleus resolution. We identify thousands of sex-biased genes across Type 1, 2A, and 2X muscle fibers and other, less abundant cell types. We find that sex-biased expression is highly concordant across the muscle fiber types and bulk muscle tissue and is enriched for genes in mitochondrial activity (males) and muscle regeneration (females) pathways. We also find that lncRNAs and miRNAs, two classes of genes with regulatory functions, show extensive sex-biased expression in the fiber-type and bulk data, respectively. We find widespread sex-biased chromatin accessibility enriched in regulatory chromatin states. Together, these results highlight nuclear and cytoplasmic mechanisms for sex-differential gene regulation in skeletal muscle.
dc.language.isoen_US
dc.subjectstatistical genetics
dc.titleIntegrative Analyses of Genetic and Genomic Sequence Data to Improve GWAS Interpretation
dc.typeThesis
dc.description.thesisdegreenamePhD
dc.description.thesisdegreedisciplineBiostatistics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberBoehnke, Michael Lee
dc.contributor.committeememberScott, Laura
dc.contributor.committeememberParker, Stephen CJ
dc.contributor.committeememberFuchsberger, Christian
dc.contributor.committeememberKang, Hyun Min
dc.contributor.committeememberWen, Xiaoquan William
dc.subject.hlbsecondlevelGenetics
dc.subject.hlbtoplevelScience
dc.contributor.affiliationumcampusAnn Arbor
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/192438/1/schanks_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/22347
dc.identifier.orcid0000-0003-2978-5289
dc.identifier.name-orcidHanks, Sarah; 0000-0003-2978-5289en_US
dc.working.doi10.7302/22347en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.