SMARTS Approach to Chemical Data Mining and Physicochemical Property Prediction.
dc.contributor.author | Lee, Adam C. | en_US |
dc.date.accessioned | 2010-01-07T16:23:06Z | |
dc.date.available | NO_RESTRICTION | en_US |
dc.date.available | 2010-01-07T16:23:06Z | |
dc.date.issued | 2009 | en_US |
dc.date.submitted | en_US | |
dc.identifier.uri | https://hdl.handle.net/2027.42/64627 | |
dc.description.abstract | The calculation of physicochemical and biological properties is essential in order to facilitate modern drug discovery. Chemical spaces dimensionalized by these descriptors have been used to scaffold-hop in order to discover new lead and drug-like molecules. Broadening the boundaries of structure based drug design, these molecules are expected to share the same physiological target and have similar efficacy, as do known drug molecules sharing the same region in chemical property space. In the past few decades physicochemical and ADMET (absorption, distribution, metabolism, elimination, and toxicity) property predictors have been the subject of increased focus in academia and the pharmaceutical industry. Due to the ever increasing attention given to data mining and property predictions, we first discuss the sources of experimental pKa values and current methodologies used for pKa prediction in proteins and small molecules. Of particular concern is an analysis of the scope, statistical validity, overall accuracy, and predictive power of these methods. The expressed concerns are not limited to predicting pKa, but apply to all empirical predictive methodologies. In a bottom-up approach, we explored the influence of freely generated SMARTS string representations of molecular fragments on chelation and cytotoxicity. Later investigations, involving the derivation of predictive models, use stepwise regression to determine the optimal pool of SMARTS strings having the greatest influence over the property of interest. By applying a unique scoring system to sets of highly generalized SMARTS strings, we have constructed well balanced regression trees with predictive accuracy exceeding that of many published and commercially available models for cytotoxicity, pKa, and aqueous solubility. The methodology is robust, extremely adaptable, and can handle any molecular dataset with experimental data. This story details our struggles of data gathering, curation, and the development of a machine learning methodology able to derive and validate highly accurate regression trees capable of extremely fast property predictions. Regression trees created by our method are well suited to calculate descriptors for large in silico molecular libraries, facilitating data mining of chemical spaces in search of new lead molecules in drug discovery. | en_US |
dc.format.extent | 5908007 bytes | |
dc.format.extent | 1373 bytes | |
dc.format.mimetype | application/pdf | |
dc.format.mimetype | text/plain | |
dc.language.iso | en_US | en_US |
dc.subject | Cheminformatics | en_US |
dc.subject | Chemoinformatics | en_US |
dc.subject | Chemical Data Mining | en_US |
dc.subject | Physicochemical Property Prediction | en_US |
dc.subject | Chemical Spaces | en_US |
dc.subject | Drug Discovery | en_US |
dc.title | SMARTS Approach to Chemical Data Mining and Physicochemical Property Prediction. | en_US |
dc.type | Thesis | en_US |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Medicinal Chemistry | en_US |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | en_US |
dc.contributor.committeemember | Crippen, Gordon M. | en_US |
dc.contributor.committeemember | Rosania, Gustavo | en_US |
dc.contributor.committeemember | Shedden, Kerby A. | en_US |
dc.contributor.committeemember | Tsodikov, Oleg V. | en_US |
dc.subject.hlbsecondlevel | Computer Science | en_US |
dc.subject.hlbsecondlevel | Pharmacy and Pharmacology | en_US |
dc.subject.hlbsecondlevel | Chemistry | en_US |
dc.subject.hlbsecondlevel | Science (General) | en_US |
dc.subject.hlbtoplevel | Engineering | en_US |
dc.subject.hlbtoplevel | Health Sciences | en_US |
dc.subject.hlbtoplevel | Science | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/64627/1/adamclee_1.pdf | |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.