The Road to Identifying Disease Causing Genes: Association Tests, Genotype Imputations, and Sampling Strategies for Sequencing Studies.
dc.contributor.author | Zhang, Peng | en_US |
dc.date.accessioned | 2013-09-24T16:01:14Z | |
dc.date.available | NO_RESTRICTION | en_US |
dc.date.available | 2013-09-24T16:01:14Z | |
dc.date.issued | 2013 | en_US |
dc.date.submitted | 2013 | en_US |
dc.identifier.uri | https://hdl.handle.net/2027.42/99798 | |
dc.description.abstract | Technological advances now allow investigators to use sequencing data to identify genetic risk variants for complex diseases. However, it is still expensive to sequence a large sample of individuals. While genotype imputation can augment sequence studies, challenges still remain, such as imputation with population or family structures and imputation of rare variants. This dissertation aims to tackle these two challenges. The first project considers imputation with family structures, which extended from an existing imputation program that assumes unrelated individuals in a sample. I propose a strategy for imputing data with family structures and apply it to a family-based association study for bipolar disorder. The results suggest the involvement of ion channelopathy in bipolar pathogenesis. The second and third projects provide sampling strategies for next-generation sequencing. The goal is to select a subset from a study sample that incorporates maximal number of variants when sequenced, or to achieve maximal imputation accuracy when impute the sequences of the rest study sample using the sequenced subset or both. In the second project, I propose the “most diverse panel” by adapting the concept of the phylogenetic diversity. This strategy assumes that the panel with the biggest overall tree length in the phylogenetic tree represents the longest evolutionary time, allowing the maximal number of mutation events to occur. Sequencing such a panel can thus identify the maximal number of variants. In the third project I propose the “most representative panel” by considering both the selected and unselected haplotypes. The goal is to identify at least one optimal selected reference haplotype for each unselected haplotype. Because it is computationally impossible to perform an exhaustive search for a large sample size, I develop a hill-climbing algorithm that updates a randomly selected panel a predefined number of iterations or until it converges. Using simulated sequence data and real sequence data from the 1000 Genomes Project, I compare the two proposed panels to randomly selected panels and provide suggestions on which algorithm to use when planning sequencing studies with specific study samples. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | Genotype Imputation | en_US |
dc.subject | Statistical Genetics | en_US |
dc.subject | Bioinformatics | en_US |
dc.subject | Next-generation Sequencing | en_US |
dc.subject | Phylogenetic Diversity | en_US |
dc.subject | Study Design | en_US |
dc.title | The Road to Identifying Disease Causing Genes: Association Tests, Genotype Imputations, and Sampling Strategies for Sequencing Studies. | en_US |
dc.type | Thesis | en_US |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Bioinformatics | en_US |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | en_US |
dc.contributor.committeemember | Zoellner, Sebastian K. | en_US |
dc.contributor.committeemember | Li, Jun | en_US |
dc.contributor.committeemember | Rosenberg, Noah A. | en_US |
dc.contributor.committeemember | Boehnke, Michael Lee | en_US |
dc.contributor.committeemember | Burmeister, Margit | en_US |
dc.subject.hlbsecondlevel | Genetics | en_US |
dc.subject.hlbsecondlevel | Molecular, Cellular and Developmental Biology | en_US |
dc.subject.hlbsecondlevel | Neurosciences | en_US |
dc.subject.hlbsecondlevel | Psychiatry | en_US |
dc.subject.hlbsecondlevel | Public Health | en_US |
dc.subject.hlbsecondlevel | Statistics and Numeric Data | en_US |
dc.subject.hlbtoplevel | Health Sciences | en_US |
dc.subject.hlbtoplevel | Science | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/99798/1/penzhang_1.pdf | |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.