Statistical Methods in Population Genetics and Viral Phylodynamics
dc.contributor.author | Ki, Caleb | |
dc.date.accessioned | 2022-05-25T15:29:33Z | |
dc.date.available | 2022-05-25T15:29:33Z | |
dc.date.issued | 2022 | |
dc.date.submitted | 2022 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/172737 | |
dc.description.abstract | Genetic sequences carry a wealth of information. Scientists and statisticians have utilized genetic variation data to answer a wide range of questions in evolutionary biology and epidemiology. With the advent of high throughput sequencing, the availability of genetic sequence data has exploded this century. While the unprecedented amount of genetic data available presents an opportunity to garner a deeper understanding about viruses and humans, making use of large volumes of genetic data is still a challenging problem. In what is to follow, we present three methods that tackle various problems analyzing genetic variation data. First, we introduce the framework known as the sequentially Markov coalescent (SMC), which enables likelihood based inference using hidden Markov models (HMMs) where the latent variables represent genealogies. While genealogies are continuous, HMMs are discrete, requiring SMC based methods to discretize genealogies. This discretization often leads to biased and noisy estimates of the population size history. We introduce a method that avoids the need for discretization leading to Bayesian and frequentist inference procedures that are faster and less biased than its predecessors. Additionally, while coalescent HMMs based on SMC can be decoded in linear time, there does not yet exist a linear time EM algorithm for coalescent HMMs based on SMC', the more accurate approximation. We present a linear time EM algorithm based on SMC'. Advantages of this method include increased accuracy, computation time, uncertainty quantification, and ability to incorporate regularization. Lastly, we present a new approach for estimating transmission and recovery rates of viruses using genetic sequence data. With the outbreak of the SARS-CoV-2, there are millions of genomic sequences available to analyze, but few methods to exploit the information contained in these sequences. By integrating recent advances in Bayesian inference and differentiable programming with phylodynamics, we provide a method capable of estimating transmission, recovery, and sampling of pathogens using thousands of sequences. We apply our method to SARS-CoV-2 data and find that our estimates of the effective reproductive number closely match other estimates from methods based on public health data. | |
dc.language.iso | en_US | |
dc.subject | population genetics | |
dc.subject | viral phylogenetics | |
dc.title | Statistical Methods in Population Genetics and Viral Phylodynamics | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Statistics | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Terhorst, Jonathan | |
dc.contributor.committeemember | Zoellner, Sebastian K | |
dc.contributor.committeemember | Chen, Yang | |
dc.contributor.committeemember | Ionides, Edward | |
dc.subject.hlbsecondlevel | Statistics and Numeric Data | |
dc.subject.hlbtoplevel | Science | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/172737/1/calebki_1.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/4766 | |
dc.identifier.orcid | 0000-0001-9089-3735 | |
dc.identifier.name-orcid | Ki, Caleb; 0000-0001-9089-3735 | en_US |
dc.working.doi | 10.7302/4766 | en |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.