SMARTS Approach to Chemical Data Mining and Physicochemical Property  Prediction.

Lee, Adam C.

SMARTS Approach to Chemical Data Mining and Physicochemical Property Prediction.

dc.contributor.author	Lee, Adam C.	en_US
dc.date.accessioned	2010-01-07T16:23:06Z
dc.date.available	NO_RESTRICTION	en_US
dc.date.available	2010-01-07T16:23:06Z
dc.date.issued	2009	en_US
dc.date.submitted		en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/64627
dc.description.abstract	The calculation of physicochemical and biological properties is essential in order to facilitate modern drug discovery. Chemical spaces dimensionalized by these descriptors have been used to scaffold-hop in order to discover new lead and drug-like molecules. Broadening the boundaries of structure based drug design, these molecules are expected to share the same physiological target and have similar efficacy, as do known drug molecules sharing the same region in chemical property space. In the past few decades physicochemical and ADMET (absorption, distribution, metabolism, elimination, and toxicity) property predictors have been the subject of increased focus in academia and the pharmaceutical industry. Due to the ever increasing attention given to data mining and property predictions, we first discuss the sources of experimental pKa values and current methodologies used for pKa prediction in proteins and small molecules. Of particular concern is an analysis of the scope, statistical validity, overall accuracy, and predictive power of these methods. The expressed concerns are not limited to predicting pKa, but apply to all empirical predictive methodologies. In a bottom-up approach, we explored the influence of freely generated SMARTS string representations of molecular fragments on chelation and cytotoxicity. Later investigations, involving the derivation of predictive models, use stepwise regression to determine the optimal pool of SMARTS strings having the greatest influence over the property of interest. By applying a unique scoring system to sets of highly generalized SMARTS strings, we have constructed well balanced regression trees with predictive accuracy exceeding that of many published and commercially available models for cytotoxicity, pKa, and aqueous solubility. The methodology is robust, extremely adaptable, and can handle any molecular dataset with experimental data. This story details our struggles of data gathering, curation, and the development of a machine learning methodology able to derive and validate highly accurate regression trees capable of extremely fast property predictions. Regression trees created by our method are well suited to calculate descriptors for large in silico molecular libraries, facilitating data mining of chemical spaces in search of new lead molecules in drug discovery.	en_US
dc.format.extent	5908007 bytes
dc.format.extent	1373 bytes
dc.format.mimetype	application/pdf
dc.format.mimetype	text/plain
dc.language.iso	en_US	en_US
dc.subject	Cheminformatics	en_US
dc.subject	Chemoinformatics	en_US
dc.subject	Chemical Data Mining	en_US
dc.subject	Physicochemical Property Prediction	en_US
dc.subject	Chemical Spaces	en_US
dc.subject	Drug Discovery	en_US
dc.title	SMARTS Approach to Chemical Data Mining and Physicochemical Property Prediction.	en_US
dc.type	Thesis	en_US
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Medicinal Chemistry	en_US
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies	en_US
dc.contributor.committeemember	Crippen, Gordon M.	en_US
dc.contributor.committeemember	Rosania, Gustavo	en_US
dc.contributor.committeemember	Shedden, Kerby A.	en_US
dc.contributor.committeemember	Tsodikov, Oleg V.	en_US
dc.subject.hlbsecondlevel	Computer Science	en_US
dc.subject.hlbsecondlevel	Pharmacy and Pharmacology	en_US
dc.subject.hlbsecondlevel	Chemistry	en_US
dc.subject.hlbsecondlevel	Science (General)	en_US
dc.subject.hlbtoplevel	Engineering	en_US
dc.subject.hlbtoplevel	Health Sciences	en_US
dc.subject.hlbtoplevel	Science	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/64627/1/adamclee_1.pdf
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: adamclee_1.pdf
Size:: 5.634MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.