Computational Methods for Resolving Heterogeneity in Biological Data
dc.contributor.author | Zhang, Hongjiu | |
dc.date.accessioned | 2019-10-01T18:22:44Z | |
dc.date.available | NO_RESTRICTION | |
dc.date.available | 2019-10-01T18:22:44Z | |
dc.date.issued | 2019 | |
dc.date.submitted | 2019 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/151385 | |
dc.description.abstract | The complexity in biological data reflects the heterogeneous nature of biological processes. Computational methods need to preserve as much information regarding the biological process of interest as possible. In this work, we explore three specific tasks about resolving biological heterogeneity. The first task is to infer heterogeneous phylogenetic relationship using molecular data. The common likelihood models for phylogenetic inference often makes strong assumptions about the evolution process across different lineages and different mutation sites. We use convolutional neural network to infer phylogenies instead, allowing the model to describe more heterogeneous evolution process. The model outperformes commonly used algorithms on diverse simulation datasets. The second task is to infer the clonal composition and phylogeny from bulk DNA sequencing data of tumour samples. Estimating clonal information from bulk data often involves resolving mixture models. Unfortunately, simpler models are often unable to capture complex genetic alteration events in tumour cells, while more sophisticated models incur heavy computational burdens and are hard to converge. We solve the challenge through density-hinted optimization with post hoc adjustment. The model makes conservative predications but yields better accuracy in assessing co-clustering relationship among the somatic mutations. The third task is to estimate the abundance of splicing transcripts from full-length single-cell RNA sequencing data. Transcript inference from RNA sequencing data needs a plethora of reads for accurate abundance estimation. Yet single-cell sequencing yields much fewer reads than bulk sequencing. To recover transcripts from full-length single-cell RNA sequencing data, we pool reads from similar cells to help assign transcripts without disrupting the cluster structures. These methods describe complex biological processes with minimal runtime overhead. Taking these methods as examples, we will briefly discuss the rationale and some general principals in designing these methods. | |
dc.language.iso | en_US | |
dc.subject | heterogeneity | |
dc.subject | biological data | |
dc.subject | machine learning | |
dc.subject | algorithms | |
dc.title | Computational Methods for Resolving Heterogeneity in Biological Data | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Bioinformatics | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Burmeister, Margit | |
dc.contributor.committeemember | Guan, Yuanfang | |
dc.contributor.committeemember | Li, Jun | |
dc.contributor.committeemember | Najarian, Kayvan | |
dc.contributor.committeemember | Omenn, Gilbert S | |
dc.contributor.committeemember | Parker, Stephen CJ | |
dc.subject.hlbtoplevel | Health Sciences | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/151385/1/zhanghj_1.pdf | |
dc.identifier.orcid | 0000-0003-0545-5613 | |
dc.identifier.name-orcid | Zhang, Hongjiu; 0000-0003-0545-5613 | en_US |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.