Show simple item record

Using Machine Learning to Better Predict the Structure of RNA and RNA Containing Complexes

dc.contributor.authorChhabra, Sahil
dc.date.accessioned2020-05-08T14:34:27Z
dc.date.availableNO_RESTRICTION
dc.date.available2020-05-08T14:34:27Z
dc.date.issued2020
dc.date.submitted2020
dc.identifier.urihttps://hdl.handle.net/2027.42/155127
dc.description.abstractDetermining the structure of RNA in the presence of drug like molecules is a crucial step in any drug development campaign. Standard experimental approaches are expensive and time-consuming, and current state-of-the-art computational methods are too inaccurate to be useful. In principle, computer docking can be used to pre- dict the 3D structure of RNA-ligand complexes. However the scoring functions which are accompanied by the available docking programs for pose ranking of RNA-ligand complexes miss-classify native like poses among a set of decoy poses. As such, there is a need for the development of fast, easy, and precise prediction methods for determining the 3D structure of RNAs. In theory, nuclear magnetic resonance (NMR) spectroscopy derived chemical shifts contain information about the local chemical environment at each site in a molecule and so can be a source of rich structural in- formation. In this work, the goal is to predict the structure of RNA-ligand complexes using NMR chemical shifts. To that end, we explore the effect of different machine learning algorithms and ring current models to accurately predict the chemical shifts for standard RNA-ligand complexes. Extra-Randomized trees machine learning algorithms and Pople ring current model were found to be the most accurate ones at predicting the chemical shifts of RNA-ligand complexes. Next we explored the use of chemical shifts to guide the 3D structure prediction of RNA-ligand complexes starting from RNA sequence. We applied CS-Fold, an in-house method which utilizes chemical shifts to guide the secondary structure prediction of RNAs. From the best predicted secondary structures using CS-Fold, we generated de novo 3D models of RNAs using the Fragment Assembly of RNA with Full Atom Refinement (FARFAR) approach. We used chemical shifts predicted by LarmorD to refine those 3D structures. We found that CS-Fold (the CS-guided secondary structure prediction approach) combined with Rosetta de novo protocol for 3D motifs prediction significantly enhanced the recovery rates to 50% compared to 20% obtained by the RNAStructure and Rosetta combination. Next we used rDock to dock the ligand from the 10 best predicted 3D structures of the RNA and filter the poses based on the chemical shift errors. This study motivated us to build ma- chine learning models based on a molecular fingerprinting approach that can recover native-like RNA-ligand structures from non-native ones in a decoy set as described below. Next, we describe RNAPoser, a computational tool that estimate the relative “nativeness” of a set of RNA-ligand poses using machine learning pose classifiers. We trained our pose classifiers on molecular “fingerprints” that were a fusion of atomic fingerprints. These fingerprints encode the local “RNA environment” around ligand atoms. Using the classification scores from our RNAPoser classifiers and ranking the poses based on those scores, we found that the recovery of native like poses is significantly better than those obtained from just using the raw rdock docking scores. We also performed a leave-one-out validation approach and found that RNAPoser could recover ∼80% of the poses that were within 2.5 A of the native poses, in 88 RNA-ligand complexes we explored. Likewise, on a validation set of 17 complexes, we could recover poses in ∼70% of the complexes. RNAPosers could be used as a tool to help in RNA-ligand pose prediction and hence we make it available to the academic community via https://github.com/atfrank/RNAPosers.
dc.language.isoen_US
dc.subjectMachine Learning, RNA, pose prediction, NMR, fingerprinting
dc.titleUsing Machine Learning to Better Predict the Structure of RNA and RNA Containing Complexes
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineChemistry
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberFrank, Aaron Terrence
dc.contributor.committeememberTewari, Ambuj
dc.contributor.committeememberKeane, Sarah
dc.contributor.committeememberRamamoorthy, Ayyalusamy
dc.subject.hlbsecondlevelChemistry
dc.subject.hlbtoplevelScience
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/155127/1/itssahil_1.pdf
dc.identifier.orcid0000-0002-0602-3743
dc.identifier.name-orcidChhabra, Sahil; 0000-0002-0602-3743en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.