Show simple item record

The Road to Identifying Disease Causing Genes: Association Tests, Genotype Imputations, and Sampling Strategies for Sequencing Studies.

dc.contributor.authorZhang, Pengen_US
dc.date.accessioned2013-09-24T16:01:14Z
dc.date.availableNO_RESTRICTIONen_US
dc.date.available2013-09-24T16:01:14Z
dc.date.issued2013en_US
dc.date.submitted2013en_US
dc.identifier.urihttps://hdl.handle.net/2027.42/99798
dc.description.abstractTechnological advances now allow investigators to use sequencing data to identify genetic risk variants for complex diseases. However, it is still expensive to sequence a large sample of individuals. While genotype imputation can augment sequence studies, challenges still remain, such as imputation with population or family structures and imputation of rare variants. This dissertation aims to tackle these two challenges. The first project considers imputation with family structures, which extended from an existing imputation program that assumes unrelated individuals in a sample. I propose a strategy for imputing data with family structures and apply it to a family-based association study for bipolar disorder. The results suggest the involvement of ion channelopathy in bipolar pathogenesis. The second and third projects provide sampling strategies for next-generation sequencing. The goal is to select a subset from a study sample that incorporates maximal number of variants when sequenced, or to achieve maximal imputation accuracy when impute the sequences of the rest study sample using the sequenced subset or both. In the second project, I propose the “most diverse panel” by adapting the concept of the phylogenetic diversity. This strategy assumes that the panel with the biggest overall tree length in the phylogenetic tree represents the longest evolutionary time, allowing the maximal number of mutation events to occur. Sequencing such a panel can thus identify the maximal number of variants. In the third project I propose the “most representative panel” by considering both the selected and unselected haplotypes. The goal is to identify at least one optimal selected reference haplotype for each unselected haplotype. Because it is computationally impossible to perform an exhaustive search for a large sample size, I develop a hill-climbing algorithm that updates a randomly selected panel a predefined number of iterations or until it converges. Using simulated sequence data and real sequence data from the 1000 Genomes Project, I compare the two proposed panels to randomly selected panels and provide suggestions on which algorithm to use when planning sequencing studies with specific study samples.en_US
dc.language.isoen_USen_US
dc.subjectGenotype Imputationen_US
dc.subjectStatistical Geneticsen_US
dc.subjectBioinformaticsen_US
dc.subjectNext-generation Sequencingen_US
dc.subjectPhylogenetic Diversityen_US
dc.subjectStudy Designen_US
dc.titleThe Road to Identifying Disease Causing Genes: Association Tests, Genotype Imputations, and Sampling Strategies for Sequencing Studies.en_US
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineBioinformaticsen_US
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studiesen_US
dc.contributor.committeememberZoellner, Sebastian K.en_US
dc.contributor.committeememberLi, Junen_US
dc.contributor.committeememberRosenberg, Noah A.en_US
dc.contributor.committeememberBoehnke, Michael Leeen_US
dc.contributor.committeememberBurmeister, Margiten_US
dc.subject.hlbsecondlevelGeneticsen_US
dc.subject.hlbsecondlevelMolecular, Cellular and Developmental Biologyen_US
dc.subject.hlbsecondlevelNeurosciencesen_US
dc.subject.hlbsecondlevelPsychiatryen_US
dc.subject.hlbsecondlevelPublic Healthen_US
dc.subject.hlbsecondlevelStatistics and Numeric Dataen_US
dc.subject.hlbtoplevelHealth Sciencesen_US
dc.subject.hlbtoplevelScienceen_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/99798/1/penzhang_1.pdf
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.