Show simple item record

Computational Methods for Resolving Heterogeneity in Biological Data

dc.contributor.authorZhang, Hongjiu
dc.date.accessioned2019-10-01T18:22:44Z
dc.date.availableNO_RESTRICTION
dc.date.available2019-10-01T18:22:44Z
dc.date.issued2019
dc.date.submitted2019
dc.identifier.urihttps://hdl.handle.net/2027.42/151385
dc.description.abstractThe complexity in biological data reflects the heterogeneous nature of biological processes. Computational methods need to preserve as much information regarding the biological process of interest as possible. In this work, we explore three specific tasks about resolving biological heterogeneity. The first task is to infer heterogeneous phylogenetic relationship using molecular data. The common likelihood models for phylogenetic inference often makes strong assumptions about the evolution process across different lineages and different mutation sites. We use convolutional neural network to infer phylogenies instead, allowing the model to describe more heterogeneous evolution process. The model outperformes commonly used algorithms on diverse simulation datasets. The second task is to infer the clonal composition and phylogeny from bulk DNA sequencing data of tumour samples. Estimating clonal information from bulk data often involves resolving mixture models. Unfortunately, simpler models are often unable to capture complex genetic alteration events in tumour cells, while more sophisticated models incur heavy computational burdens and are hard to converge. We solve the challenge through density-hinted optimization with post hoc adjustment. The model makes conservative predications but yields better accuracy in assessing co-clustering relationship among the somatic mutations. The third task is to estimate the abundance of splicing transcripts from full-length single-cell RNA sequencing data. Transcript inference from RNA sequencing data needs a plethora of reads for accurate abundance estimation. Yet single-cell sequencing yields much fewer reads than bulk sequencing. To recover transcripts from full-length single-cell RNA sequencing data, we pool reads from similar cells to help assign transcripts without disrupting the cluster structures. These methods describe complex biological processes with minimal runtime overhead. Taking these methods as examples, we will briefly discuss the rationale and some general principals in designing these methods.
dc.language.isoen_US
dc.subjectheterogeneity
dc.subjectbiological data
dc.subjectmachine learning
dc.subjectalgorithms
dc.titleComputational Methods for Resolving Heterogeneity in Biological Data
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineBioinformatics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberBurmeister, Margit
dc.contributor.committeememberGuan, Yuanfang
dc.contributor.committeememberLi, Jun
dc.contributor.committeememberNajarian, Kayvan
dc.contributor.committeememberOmenn, Gilbert S
dc.contributor.committeememberParker, Stephen CJ
dc.subject.hlbtoplevelHealth Sciences
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/151385/1/zhanghj_1.pdf
dc.identifier.orcid0000-0003-0545-5613
dc.identifier.name-orcidZhang, Hongjiu; 0000-0003-0545-5613en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.