Statistical Methods in Population Genetics and Viral Phylodynamics

Ki, Caleb

Statistical Methods in Population Genetics and Viral Phylodynamics

dc.contributor.author	Ki, Caleb
dc.date.accessioned	2022-05-25T15:29:33Z
dc.date.available	2022-05-25T15:29:33Z
dc.date.issued	2022
dc.date.submitted	2022
dc.identifier.uri	https://hdl.handle.net/2027.42/172737
dc.description.abstract	Genetic sequences carry a wealth of information. Scientists and statisticians have utilized genetic variation data to answer a wide range of questions in evolutionary biology and epidemiology. With the advent of high throughput sequencing, the availability of genetic sequence data has exploded this century. While the unprecedented amount of genetic data available presents an opportunity to garner a deeper understanding about viruses and humans, making use of large volumes of genetic data is still a challenging problem. In what is to follow, we present three methods that tackle various problems analyzing genetic variation data. First, we introduce the framework known as the sequentially Markov coalescent (SMC), which enables likelihood based inference using hidden Markov models (HMMs) where the latent variables represent genealogies. While genealogies are continuous, HMMs are discrete, requiring SMC based methods to discretize genealogies. This discretization often leads to biased and noisy estimates of the population size history. We introduce a method that avoids the need for discretization leading to Bayesian and frequentist inference procedures that are faster and less biased than its predecessors. Additionally, while coalescent HMMs based on SMC can be decoded in linear time, there does not yet exist a linear time EM algorithm for coalescent HMMs based on SMC', the more accurate approximation. We present a linear time EM algorithm based on SMC'. Advantages of this method include increased accuracy, computation time, uncertainty quantification, and ability to incorporate regularization. Lastly, we present a new approach for estimating transmission and recovery rates of viruses using genetic sequence data. With the outbreak of the SARS-CoV-2, there are millions of genomic sequences available to analyze, but few methods to exploit the information contained in these sequences. By integrating recent advances in Bayesian inference and differentiable programming with phylodynamics, we provide a method capable of estimating transmission, recovery, and sampling of pathogens using thousands of sequences. We apply our method to SARS-CoV-2 data and find that our estimates of the effective reproductive number closely match other estimates from methods based on public health data.
dc.language.iso	en_US
dc.subject	population genetics
dc.subject	viral phylogenetics
dc.title	Statistical Methods in Population Genetics and Viral Phylodynamics
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Statistics
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Terhorst, Jonathan
dc.contributor.committeemember	Zoellner, Sebastian K
dc.contributor.committeemember	Chen, Yang
dc.contributor.committeemember	Ionides, Edward
dc.subject.hlbsecondlevel	Statistics and Numeric Data
dc.subject.hlbtoplevel	Science
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/172737/1/calebki_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/4766
dc.identifier.orcid	0000-0001-9089-3735
dc.identifier.name-orcid	Ki, Caleb; 0000-0001-9089-3735	en_US
dc.working.doi	10.7302/4766	en
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: calebki_1.pdf
Size:: 2.988MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.