Advancement of Molecular Mechanics Based Drug Discovery Through the Use of Machine Learning
Jones, Murchtricia
2021
Abstract
Drug discovery is the leading motivation for the development of new chemical entities. Improving computational methodologies is an important scientific endeavor for facilitating the development and optimization of new therapeutic agents. Particularly, this dissertation focuses on increasing the accuracy of molecular dynamics simulations which employ molecular mechanics force fields (MMFFs). MMFFs provide an atomistic representation of drug-target binding which enables the elucidation of structural information necessary to evolve compounds into viable drug candidates. The accuracy and efficiency of such computational assays are highly dependent on the initial set of force field parameters required to begin the simulation. Through many years of training and refinement, the parameters developed for macromolecules are well developed; however, the generation of force field parameters for novel chemical scaffolds can be challenging due to the vastness of small molecule chemical space. The work herein addresses this obstacle by employing machine learning models for the development of a framework which facilitates small molecule parametrization across various MMFFs. The presented framework, Machine learning based Multipurpose AtomTyper for CHARMM (ML-MATCH), considers each molecule from an atom-centric viewpoint. This framework has two components, with the first being the machine learning application. Using Random Forest, two key parameters can be predicted: atom types and partial charges. With the CHARMM General Force Field (CGenFF) as the training set, we found an average accuracy score of 96% for the classification of atom types and a Pearson R-value of 0.974e and RMSE of 0.028e for the assignment of partial charges. To validate the models, we compared ML-MATCH derived parameters to that of PARAMCHEM, the current gold standard for CGenFF based parameterization, for molecules within the FreeSolve Database. This resulted in an accuracy score of 90% for atom types and RMSE of 0.049e for partial charges. The second component of this framework is the MATCHing algorithm which serves to identify the closest MATCH between the bonded parameters of the query and those which exists in the force field’s training set. ML-MATCH derived bonded parameters were validated by conducting free energy of hydration calculations for benzene derivatives within FreeSolve which were subsequently compared to both experimental free energies and calculated hydration free energies computed using PARAMCHEM derived parameters. With the GBMV2 implicit solvent model, we found an average Pearson R-value of 0.7223 and 0.4635 for ML-MATCH and ParamChem when compared to experiment, respectively. Similarly, for the FACTS model, we found an average Pearson R-values of 0.7505 and 0.5353. These findings show that ML-MATCH derived parameters are well-suited for reproducing experimental data in simulation. Application of ML-MATCH derived parameters in more complex simulations and retraining on various force fields, shows that this framework goes beyond the status quo of current atom parameterization software in its ability to identify the underlying rules and assumption for a given force field without being explicitly programmed to do so. Therefore, the novel developed ML-MATCH platform for small molecule parametrization will be particularly useful for ligands in the studies of computer-aided drug design and developing therapeutic agents.Deep Blue DOI
Subjects
Advancement of Molecular Mechanics Based Drug Discovery Through the Use of Machine Learning
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.