Computational Methods for Resolving Heterogeneity in Biological Data

Zhang, Hongjiu

Computational Methods for Resolving Heterogeneity in Biological Data

dc.contributor.author	Zhang, Hongjiu
dc.date.accessioned	2019-10-01T18:22:44Z
dc.date.available	NO_RESTRICTION
dc.date.available	2019-10-01T18:22:44Z
dc.date.issued	2019
dc.date.submitted	2019
dc.identifier.uri	https://hdl.handle.net/2027.42/151385
dc.description.abstract	The complexity in biological data reflects the heterogeneous nature of biological processes. Computational methods need to preserve as much information regarding the biological process of interest as possible. In this work, we explore three specific tasks about resolving biological heterogeneity. The first task is to infer heterogeneous phylogenetic relationship using molecular data. The common likelihood models for phylogenetic inference often makes strong assumptions about the evolution process across different lineages and different mutation sites. We use convolutional neural network to infer phylogenies instead, allowing the model to describe more heterogeneous evolution process. The model outperformes commonly used algorithms on diverse simulation datasets. The second task is to infer the clonal composition and phylogeny from bulk DNA sequencing data of tumour samples. Estimating clonal information from bulk data often involves resolving mixture models. Unfortunately, simpler models are often unable to capture complex genetic alteration events in tumour cells, while more sophisticated models incur heavy computational burdens and are hard to converge. We solve the challenge through density-hinted optimization with post hoc adjustment. The model makes conservative predications but yields better accuracy in assessing co-clustering relationship among the somatic mutations. The third task is to estimate the abundance of splicing transcripts from full-length single-cell RNA sequencing data. Transcript inference from RNA sequencing data needs a plethora of reads for accurate abundance estimation. Yet single-cell sequencing yields much fewer reads than bulk sequencing. To recover transcripts from full-length single-cell RNA sequencing data, we pool reads from similar cells to help assign transcripts without disrupting the cluster structures. These methods describe complex biological processes with minimal runtime overhead. Taking these methods as examples, we will briefly discuss the rationale and some general principals in designing these methods.
dc.language.iso	en_US
dc.subject	heterogeneity
dc.subject	biological data
dc.subject	machine learning
dc.subject	algorithms
dc.title	Computational Methods for Resolving Heterogeneity in Biological Data
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Bioinformatics
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Burmeister, Margit
dc.contributor.committeemember	Guan, Yuanfang
dc.contributor.committeemember	Li, Jun
dc.contributor.committeemember	Najarian, Kayvan
dc.contributor.committeemember	Omenn, Gilbert S
dc.contributor.committeemember	Parker, Stephen CJ
dc.subject.hlbtoplevel	Health Sciences
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/151385/1/zhanghj_1.pdf
dc.identifier.orcid	0000-0003-0545-5613
dc.identifier.name-orcid	Zhang, Hongjiu; 0000-0003-0545-5613	en_US
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: zhanghj_1.pdf
Size:: 1.574MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.