Genotype Imputation in Diverse Populations:  Empirical and Theoretical Approaches.

Huang, Juichi (Lucy)

Genotype Imputation in Diverse Populations: Empirical and Theoretical Approaches.

dc.contributor.author	Huang, Juichi (Lucy)	en_US
dc.date.accessioned	2012-01-26T20:06:45Z
dc.date.available	NO_RESTRICTION	en_US
dc.date.available	2012-01-26T20:06:45Z
dc.date.issued	2011	en_US
dc.date.submitted	2011	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/89807
dc.description.abstract	Genome-wide association (GWA) studies, in which dense genotypes in a large sample of individuals are tested for disease associations, represent a powerful approach for uncovering disease-susceptibility genes. Genotype imputation is a statistical procedure that enables evaluation of disease associations at markers beyond those experimentally measured, by using chromosomal stretches shared between study and reference individuals to infer unmeasured genotypes in GWA samples. Crucial to the success of imputation procedures is the representation of GWA samples in reference datasets that contain “template” sequences from which the unmeasured genotypes are inferred. In this dissertation, I study the design of reference datasets for use in genetic studies in diverse human populations. First, I devise a mixture approach for selecting panels of reference data. Using genotype data from 29 worldwide populations, I show that nearly all populations benefit from the mixture approach in that the mixture approach reduces imputation error. Focusing on African populations whose genotypes are particularly difficult to impute, I investigate haplotype variation and imputation in Africa. Using various statistics on haplotype variation to explain variation in imputation accuracy, I find that simple statistics, such as Fst, which measure genetic distance between study and reference populations are useful metrics for guiding the selection of reference panels. Next, I quantify the increase in the minimal sample size, due to imperfect imputation, that would be required to provide the same level of statistical evidence of disease predisposition for genetic variants that are imputed rather than experimentally measured. Finally, I develop a coalescent model for evaluating imputation accuracy. Under this model, use of reference sequences selected based on observed genetic similarity to a study sequence targeted for imputation produces higher imputation accuracy than use of reference sequences selected based on population of origin. This result suggests a reference-selection strategy that chooses template sequences from multiple populations, including the target population itself. Together, results from this dissertation can inform study design for future GWA studies. In particular, they can facilitate the design of reference datasets for use in imputation-based studies, thereby improving the search for genetic determinants that affect human health in populations worldwide.	en_US
dc.language.iso	en_US	en_US
dc.subject	Genotype Imputation	en_US
dc.subject	Genome-wide Association Study	en_US
dc.title	Genotype Imputation in Diverse Populations: Empirical and Theoretical Approaches.	en_US
dc.type	Thesis	en_US
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Bioinformatics	en_US
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies	en_US
dc.contributor.committeemember	Rosenberg, Noah A.	en_US
dc.contributor.committeemember	Zoellner, Sebastian K.	en_US
dc.contributor.committeemember	Boehnke, Michael Lee	en_US
dc.contributor.committeemember	Gruber, Stephen B.	en_US
dc.contributor.committeemember	Li, Jun	en_US
dc.subject.hlbsecondlevel	Genetics	en_US
dc.subject.hlbtoplevel	Health Sciences	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/89807/1/hlucy_1.pdf
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: hlucy_1.pdf
Size:: 1.349MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.