Statistical Methods and Analysis in Next Generation Sequencing.

Zhan, Xiaowei

Statistical Methods and Analysis in Next Generation Sequencing.

dc.contributor.author	Zhan, Xiaowei	en_US
dc.date.accessioned	2014-06-02T18:15:07Z
dc.date.available	NO_RESTRICTION	en_US
dc.date.available	2014-06-02T18:15:07Z
dc.date.issued	2014	en_US
dc.date.submitted	2014	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/107129
dc.description.abstract	Next generation sequencing (NGS) is a technology that advances our knowledge of human medical genetics with unprecedented amount of data. This vast amount of data presents challenges to existing statistical methods. In this dissertation, I present three studies that demonstrate methods for efficiently analyzing NGS data using both simulated and real data. In the first study, I develop ancestry inference method using small amounts of sequence data. In comparison to microarray experiments, sequencing data produce uneven coverage and genotypes with higher error rates than those traditionally used for principal components analysis (PCA) of genetic ancestry. I overcome some of these challenges using a novel statistical method modeling sequence data directly without relying on intermediate genotype calls. My method achieves high accuracy in simulated data based on the Human Genome Diversity Panel as well as in a targeted sequencing study of age related macular degeneration. In our age-related macular degeneration study, our approach helps discover a high-risk rare variant in the Complement 3 gene. In the second chapter, I develop a model-based ancestry inference method that improves upon previous work described in the first study. It is based on a likelihood-based model of ancestral location, using sequencing data as input. Without losing accuracy, it increases computational efficiency. For each sample, a parallelizable optimization algorithm can infer ancestry using a fraction of the computational resources required for PCA-based methods. Evaluation using in the Human Genome Diversity Panel and age-related macular degeneration data set demonstrates its accuracy and efficiency. In the final study, I develop an improved genotype call method for low-coverage sequencing data. As high quality reference panels grow, it is helpful to incorporate these into genotype calling of new samples. Using a coalescent based simulation and real data from the 1000 Genomes Project, I evaluate the utility of my method (which uses a panel of previously sequenced samples) to improve analyses of samples sequenced at various depths. The improvement in accuracy and computation time will be measured as a function of reference panel size. This work will be useful to investigators undertaking sequencing and analysis of new human samples.	en_US
dc.language.iso	en_US	en_US
dc.subject	Next Generation Sequencing	en_US
dc.subject	Ancestral Inference	en_US
dc.subject	Age-related Macular Degeneration	en_US
dc.subject	Imputation	en_US
dc.subject	Targeted Sequencing	en_US
dc.subject	Genetic Association Studies	en_US
dc.title	Statistical Methods and Analysis in Next Generation Sequencing.	en_US
dc.type	Thesis	en_US
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Biostatistics	en_US
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies	en_US
dc.contributor.committeemember	Abecasis, Goncalo	en_US
dc.contributor.committeemember	Burmeister, Margit	en_US
dc.contributor.committeemember	Boehnke, Michael Lee	en_US
dc.contributor.committeemember	Kang, Hyun Min	en_US
dc.subject.hlbsecondlevel	Public Health	en_US
dc.subject.hlbtoplevel	Health Sciences	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/107129/1/zhanxw_1.pdf
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: zhanxw_1.pdf
Size:: 4.954MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.