Show simple item record

SMARTS Approach to Chemical Data Mining and Physicochemical Property Prediction.

dc.contributor.authorLee, Adam C.en_US
dc.date.accessioned2010-01-07T16:23:06Z
dc.date.availableNO_RESTRICTIONen_US
dc.date.available2010-01-07T16:23:06Z
dc.date.issued2009en_US
dc.date.submitteden_US
dc.identifier.urihttps://hdl.handle.net/2027.42/64627
dc.description.abstractThe calculation of physicochemical and biological properties is essential in order to facilitate modern drug discovery. Chemical spaces dimensionalized by these descriptors have been used to scaffold-hop in order to discover new lead and drug-like molecules. Broadening the boundaries of structure based drug design, these molecules are expected to share the same physiological target and have similar efficacy, as do known drug molecules sharing the same region in chemical property space. In the past few decades physicochemical and ADMET (absorption, distribution, metabolism, elimination, and toxicity) property predictors have been the subject of increased focus in academia and the pharmaceutical industry. Due to the ever increasing attention given to data mining and property predictions, we first discuss the sources of experimental pKa values and current methodologies used for pKa prediction in proteins and small molecules. Of particular concern is an analysis of the scope, statistical validity, overall accuracy, and predictive power of these methods. The expressed concerns are not limited to predicting pKa, but apply to all empirical predictive methodologies. In a bottom-up approach, we explored the influence of freely generated SMARTS string representations of molecular fragments on chelation and cytotoxicity. Later investigations, involving the derivation of predictive models, use stepwise regression to determine the optimal pool of SMARTS strings having the greatest influence over the property of interest. By applying a unique scoring system to sets of highly generalized SMARTS strings, we have constructed well balanced regression trees with predictive accuracy exceeding that of many published and commercially available models for cytotoxicity, pKa, and aqueous solubility. The methodology is robust, extremely adaptable, and can handle any molecular dataset with experimental data. This story details our struggles of data gathering, curation, and the development of a machine learning methodology able to derive and validate highly accurate regression trees capable of extremely fast property predictions. Regression trees created by our method are well suited to calculate descriptors for large in silico molecular libraries, facilitating data mining of chemical spaces in search of new lead molecules in drug discovery.en_US
dc.format.extent5908007 bytes
dc.format.extent1373 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypetext/plain
dc.language.isoen_USen_US
dc.subjectCheminformaticsen_US
dc.subjectChemoinformaticsen_US
dc.subjectChemical Data Miningen_US
dc.subjectPhysicochemical Property Predictionen_US
dc.subjectChemical Spacesen_US
dc.subjectDrug Discoveryen_US
dc.titleSMARTS Approach to Chemical Data Mining and Physicochemical Property Prediction.en_US
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineMedicinal Chemistryen_US
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studiesen_US
dc.contributor.committeememberCrippen, Gordon M.en_US
dc.contributor.committeememberRosania, Gustavoen_US
dc.contributor.committeememberShedden, Kerby A.en_US
dc.contributor.committeememberTsodikov, Oleg V.en_US
dc.subject.hlbsecondlevelComputer Scienceen_US
dc.subject.hlbsecondlevelPharmacy and Pharmacologyen_US
dc.subject.hlbsecondlevelChemistryen_US
dc.subject.hlbsecondlevelScience (General)en_US
dc.subject.hlbtoplevelEngineeringen_US
dc.subject.hlbtoplevelHealth Sciencesen_US
dc.subject.hlbtoplevelScienceen_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/64627/1/adamclee_1.pdf
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.