Show simple item record

Applications of Machine Learning in Post-synthetic Regulation of Biomacromolecules

dc.contributor.authorBarnier, Catherine
dc.date.accessioned2025-01-06T18:17:20Z
dc.date.available2025-01-06T18:17:20Z
dc.date.issued2024
dc.date.submitted2024
dc.identifier.urihttps://hdl.handle.net/2027.42/196049
dc.description.abstractPost-synthetic regulation of biomacromolecules is a critical step in the maintenance of cellular processes. Dysregulation of regulatory mechanisms like mRNA decay and protein-protein interactions (PPIs) can lead to a wide range of diseases including cancers, when mRNAs tied to various proto-oncogenes and growth factors are misregulated, and neurodegenerative diseases like Huntington’s and Alzheimer’s where protein aggregation is implicated. Meanwhile, rapid advancements in machine learning techniques have opened the door for applications of new tools to long-standing biological questions related to these post-synthetic timepoints. The work presented in this dissertation leverages regression analysis and deep-learning tools to address two primary challenges related first, to post-transcriptional regulation and second, to post-translational regulation. Modeling transcriptome-wide RNA-binding protein influence – In chapters 2 and 3 we propose a regression analysis driven approach to modeling the impact of RNA binding proteins (RBPs) Pum and Zfs1 in Drosophila melanogaster and Candida species, respectively. Key binding motifs are identified leveraging mutual information to make as few initial statistical assumptions as possible about the main regulatory players in these systems. Next, we extract sequence-context and secondary structural features of regulatory regions in target mRNAs and use linear models to quantify the impact of these features on changes in gene expression observed in RNA-sequencing data from Pum knockdown or Zfs1 knockout experiments. These fully interpretable linear models allow us to explain a fraction of the variance observed in the gene expression data, and our results further suggest a highly complex network of regulatory factors extending beyond what was captured in our models. Application of deep-learning to complement protein-protein interaction studies – In chapter 4, we investigate the use of cutting-edge PPI mapping techniques on both the experimental and computational sides, applying deep-learning tools to the PPI targets identified by crosslinking-based mass spectrometry (XL-MS) in the clinically relevant, uropathogen E. coli UTI89. Due to finding a stark discrepancy between the experimental confidence metrics and the predicted accuracy scores (ipTMs) for targets in this dataset, we further investigate the robustness of both approaches, interrogating agreement of the XL-MS results with existing literature and information density of input to the deep-learning tool AlphaFold2. We find that each tool performed well within the bounds of its capabilities but that deep-learning either underscores or performs poorly on understudied targets with high-confidence experimental evidence. Thus, we conclude chapter 4 with commentary on the need to give priority to experimental approaches over deep-learning approaches, particularly in cases of novel PPI detection. Lastly, we provide a framework in which to view ipTM scores and relevant guidance on how to interpret this metric in the context of experimental PPI detection methods. Collectively, this work provides further insight into mRNA decay mediated by Pum and Zfs1 as well as guidance on how to leverage rapidly advancing deep-learning strategies to complement experimental protein-protein interactome mapping techniques. These contributions move us towards a better understanding and robust interpretations of results in future studies of these post-synthetic timepoints.
dc.language.isoen_US
dc.subjectmachine learning applications, deep-learning, and regression analysis or regression modeling
dc.subjectpost-transcriptional regulation and post-translational regulation
dc.subjectPum-mediated mRNA decay in Drosophila melanogaster
dc.subjectZfs1-mediated mRNA decay in Candida guilliermondii and Candida albicans
dc.subjectAlphaFold2 multimeric protein structure prediction
dc.subjectcrosslinking-based mass spectrometry with DC4 in biofilm uropathogen Escherichia coli UTI89
dc.titleApplications of Machine Learning in Post-synthetic Regulation of Biomacromolecules
dc.typeThesis
dc.description.thesisdegreenamePhD
dc.description.thesisdegreedisciplineBioinformatics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberFreddolino, Lydia
dc.contributor.committeememberChapman, Matthew R
dc.contributor.committeememberAndrews, Philip C
dc.contributor.committeememberLin, Nina
dc.contributor.committeememberO'Meara, Matthew
dc.subject.hlbsecondlevelScience (General)
dc.subject.hlbtoplevelScience
dc.contributor.affiliationumcampusAnn Arbor
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/196049/1/barnierc_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/24985
dc.identifier.orcid0000-0001-7074-4011
dc.identifier.name-orcidBarnier, Catherine; 0000-0001-7074-4011en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.