Statistical Methods and Analysis in Next Generation Sequencing.
dc.contributor.author | Zhan, Xiaowei | en_US |
dc.date.accessioned | 2014-06-02T18:15:07Z | |
dc.date.available | NO_RESTRICTION | en_US |
dc.date.available | 2014-06-02T18:15:07Z | |
dc.date.issued | 2014 | en_US |
dc.date.submitted | 2014 | en_US |
dc.identifier.uri | https://hdl.handle.net/2027.42/107129 | |
dc.description.abstract | Next generation sequencing (NGS) is a technology that advances our knowledge of human medical genetics with unprecedented amount of data. This vast amount of data presents challenges to existing statistical methods. In this dissertation, I present three studies that demonstrate methods for efficiently analyzing NGS data using both simulated and real data. In the first study, I develop ancestry inference method using small amounts of sequence data. In comparison to microarray experiments, sequencing data produce uneven coverage and genotypes with higher error rates than those traditionally used for principal components analysis (PCA) of genetic ancestry. I overcome some of these challenges using a novel statistical method modeling sequence data directly without relying on intermediate genotype calls. My method achieves high accuracy in simulated data based on the Human Genome Diversity Panel as well as in a targeted sequencing study of age related macular degeneration. In our age-related macular degeneration study, our approach helps discover a high-risk rare variant in the Complement 3 gene. In the second chapter, I develop a model-based ancestry inference method that improves upon previous work described in the first study. It is based on a likelihood-based model of ancestral location, using sequencing data as input. Without losing accuracy, it increases computational efficiency. For each sample, a parallelizable optimization algorithm can infer ancestry using a fraction of the computational resources required for PCA-based methods. Evaluation using in the Human Genome Diversity Panel and age-related macular degeneration data set demonstrates its accuracy and efficiency. In the final study, I develop an improved genotype call method for low-coverage sequencing data. As high quality reference panels grow, it is helpful to incorporate these into genotype calling of new samples. Using a coalescent based simulation and real data from the 1000 Genomes Project, I evaluate the utility of my method (which uses a panel of previously sequenced samples) to improve analyses of samples sequenced at various depths. The improvement in accuracy and computation time will be measured as a function of reference panel size. This work will be useful to investigators undertaking sequencing and analysis of new human samples. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | Next Generation Sequencing | en_US |
dc.subject | Ancestral Inference | en_US |
dc.subject | Age-related Macular Degeneration | en_US |
dc.subject | Imputation | en_US |
dc.subject | Targeted Sequencing | en_US |
dc.subject | Genetic Association Studies | en_US |
dc.title | Statistical Methods and Analysis in Next Generation Sequencing. | en_US |
dc.type | Thesis | en_US |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Biostatistics | en_US |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | en_US |
dc.contributor.committeemember | Abecasis, Goncalo | en_US |
dc.contributor.committeemember | Burmeister, Margit | en_US |
dc.contributor.committeemember | Boehnke, Michael Lee | en_US |
dc.contributor.committeemember | Kang, Hyun Min | en_US |
dc.subject.hlbsecondlevel | Public Health | en_US |
dc.subject.hlbtoplevel | Health Sciences | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/107129/1/zhanxw_1.pdf | |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.