Genotype Imputation in Diverse Populations: Empirical and Theoretical Approaches.
dc.contributor.author | Huang, Juichi (Lucy) | en_US |
dc.date.accessioned | 2012-01-26T20:06:45Z | |
dc.date.available | NO_RESTRICTION | en_US |
dc.date.available | 2012-01-26T20:06:45Z | |
dc.date.issued | 2011 | en_US |
dc.date.submitted | 2011 | en_US |
dc.identifier.uri | https://hdl.handle.net/2027.42/89807 | |
dc.description.abstract | Genome-wide association (GWA) studies, in which dense genotypes in a large sample of individuals are tested for disease associations, represent a powerful approach for uncovering disease-susceptibility genes. Genotype imputation is a statistical procedure that enables evaluation of disease associations at markers beyond those experimentally measured, by using chromosomal stretches shared between study and reference individuals to infer unmeasured genotypes in GWA samples. Crucial to the success of imputation procedures is the representation of GWA samples in reference datasets that contain “template” sequences from which the unmeasured genotypes are inferred. In this dissertation, I study the design of reference datasets for use in genetic studies in diverse human populations. First, I devise a mixture approach for selecting panels of reference data. Using genotype data from 29 worldwide populations, I show that nearly all populations benefit from the mixture approach in that the mixture approach reduces imputation error. Focusing on African populations whose genotypes are particularly difficult to impute, I investigate haplotype variation and imputation in Africa. Using various statistics on haplotype variation to explain variation in imputation accuracy, I find that simple statistics, such as Fst, which measure genetic distance between study and reference populations are useful metrics for guiding the selection of reference panels. Next, I quantify the increase in the minimal sample size, due to imperfect imputation, that would be required to provide the same level of statistical evidence of disease predisposition for genetic variants that are imputed rather than experimentally measured. Finally, I develop a coalescent model for evaluating imputation accuracy. Under this model, use of reference sequences selected based on observed genetic similarity to a study sequence targeted for imputation produces higher imputation accuracy than use of reference sequences selected based on population of origin. This result suggests a reference-selection strategy that chooses template sequences from multiple populations, including the target population itself. Together, results from this dissertation can inform study design for future GWA studies. In particular, they can facilitate the design of reference datasets for use in imputation-based studies, thereby improving the search for genetic determinants that affect human health in populations worldwide. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | Genotype Imputation | en_US |
dc.subject | Genome-wide Association Study | en_US |
dc.title | Genotype Imputation in Diverse Populations: Empirical and Theoretical Approaches. | en_US |
dc.type | Thesis | en_US |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Bioinformatics | en_US |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | en_US |
dc.contributor.committeemember | Rosenberg, Noah A. | en_US |
dc.contributor.committeemember | Zoellner, Sebastian K. | en_US |
dc.contributor.committeemember | Boehnke, Michael Lee | en_US |
dc.contributor.committeemember | Gruber, Stephen B. | en_US |
dc.contributor.committeemember | Li, Jun | en_US |
dc.subject.hlbsecondlevel | Genetics | en_US |
dc.subject.hlbtoplevel | Health Sciences | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/89807/1/hlucy_1.pdf | |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.