Applied Machine Learning for Big Data Genomics
dc.contributor.author | Ho, Steve | |
dc.date.accessioned | 2025-05-12T17:38:47Z | |
dc.date.available | 2025-05-12T17:38:47Z | |
dc.date.issued | 2025 | |
dc.date.submitted | 2025 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/197225 | |
dc.description.abstract | This thesis explores the intersection between machine learning and genomics, addressing fundamental challenges in extracting meaningful biological knowledge from the exponentially growing volume of complex data. Through three complementary projects, I demonstrate how computational frameworks can transform multi-dimensional data into interpretable biological insight while establishing methodologies that bridge artificial intelligence and genomics. The work first investigates natural language processing as a means to overcome limitations in synthesizing scientific literature. By analyzing millions of genomics-related abstracts, I demonstrate that embedding models can predict important biological relationships before their formal documentation indicating significant undiscovered knowledge exists distributed across scientific literature. Next, I present a highly modular end-to-end pipeline for the automated construction of omics- based graph representations and training of graph neural networks. This framework addresses the critical challenge of developing novel modeling paradigms with limited precedent, enabling systematic architecture exploration and ensuring reproducibility in omics-based deep learning experiments. Building on this foundation, I introduce Omics Graph Learning (OGL), a novel graph neural network architecture that represents the genome as interconnected biological entities rather than sequence bins. Through systematic in-silico perturbation analyses, OGL reveals tissue-specific regulatory mechanisms, non-additive relationships between epigenetic marks, and identifies regulatory elements with disproportionate influence on gene expression. Collectively, this thesis demonstrates how precisely designed computational approaches can extract biological insights despite the inherent complexity of genomics. Rather than pursuing prediction accuracy alone, each project prioritizes discovery and interpretation, establishing frameworks that reveal meaningful patterns from high-dimensional information. The work provides tools to navigate the expansive landscape of genomic data and advances our understanding of the multifaceted processes shaping gene regulation. | |
dc.language.iso | en_US | |
dc.subject | Machine learning genomics | |
dc.title | Applied Machine Learning for Big Data Genomics | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | |
dc.description.thesisdegreediscipline | Genetics and Genomics PhD | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Mills, Ryan Edward | |
dc.contributor.committeemember | Rao, Arvind | |
dc.contributor.committeemember | Boyle, Alan P | |
dc.contributor.committeemember | Camper, Sally Ann | |
dc.contributor.committeemember | Mueller, Jacob L | |
dc.subject.hlbsecondlevel | Genetics | |
dc.subject.hlbtoplevel | Science | |
dc.contributor.affiliationumcampus | Ann Arbor | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/197225/1/stevesho_1.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/25651 | |
dc.identifier.orcid | 0000-0002-6685-774X | |
dc.identifier.name-orcid | Ho, Steve; 0000-0002-6685-774X | en_US |
dc.working.doi | 10.7302/25651 | en |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.