Show simple item record

Statistical Methods and Analysis in Next Generation Sequencing.

dc.contributor.authorZhan, Xiaoweien_US
dc.date.accessioned2014-06-02T18:15:07Z
dc.date.availableNO_RESTRICTIONen_US
dc.date.available2014-06-02T18:15:07Z
dc.date.issued2014en_US
dc.date.submitted2014en_US
dc.identifier.urihttps://hdl.handle.net/2027.42/107129
dc.description.abstractNext generation sequencing (NGS) is a technology that advances our knowledge of human medical genetics with unprecedented amount of data. This vast amount of data presents challenges to existing statistical methods. In this dissertation, I present three studies that demonstrate methods for efficiently analyzing NGS data using both simulated and real data. In the first study, I develop ancestry inference method using small amounts of sequence data. In comparison to microarray experiments, sequencing data produce uneven coverage and genotypes with higher error rates than those traditionally used for principal components analysis (PCA) of genetic ancestry. I overcome some of these challenges using a novel statistical method modeling sequence data directly without relying on intermediate genotype calls. My method achieves high accuracy in simulated data based on the Human Genome Diversity Panel as well as in a targeted sequencing study of age related macular degeneration. In our age-related macular degeneration study, our approach helps discover a high-risk rare variant in the Complement 3 gene. In the second chapter, I develop a model-based ancestry inference method that improves upon previous work described in the first study. It is based on a likelihood-based model of ancestral location, using sequencing data as input. Without losing accuracy, it increases computational efficiency. For each sample, a parallelizable optimization algorithm can infer ancestry using a fraction of the computational resources required for PCA-based methods. Evaluation using in the Human Genome Diversity Panel and age-related macular degeneration data set demonstrates its accuracy and efficiency. In the final study, I develop an improved genotype call method for low-coverage sequencing data. As high quality reference panels grow, it is helpful to incorporate these into genotype calling of new samples. Using a coalescent based simulation and real data from the 1000 Genomes Project, I evaluate the utility of my method (which uses a panel of previously sequenced samples) to improve analyses of samples sequenced at various depths. The improvement in accuracy and computation time will be measured as a function of reference panel size. This work will be useful to investigators undertaking sequencing and analysis of new human samples.en_US
dc.language.isoen_USen_US
dc.subjectNext Generation Sequencingen_US
dc.subjectAncestral Inferenceen_US
dc.subjectAge-related Macular Degenerationen_US
dc.subjectImputationen_US
dc.subjectTargeted Sequencingen_US
dc.subjectGenetic Association Studiesen_US
dc.titleStatistical Methods and Analysis in Next Generation Sequencing.en_US
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineBiostatisticsen_US
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studiesen_US
dc.contributor.committeememberAbecasis, Goncaloen_US
dc.contributor.committeememberBurmeister, Margiten_US
dc.contributor.committeememberBoehnke, Michael Leeen_US
dc.contributor.committeememberKang, Hyun Minen_US
dc.subject.hlbsecondlevelPublic Healthen_US
dc.subject.hlbtoplevelHealth Sciencesen_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/107129/1/zhanxw_1.pdf
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.