Methodological Advances for Drug Discovery and Protein
Engineering

Ding, Xinqiang

Methodological Advances for Drug Discovery and Protein Engineering

dc.contributor.author	Ding, Xinqiang
dc.date.accessioned	2019-02-07T17:55:30Z
dc.date.available	NO_RESTRICTION
dc.date.available	2019-02-07T17:55:30Z
dc.date.issued	2018
dc.date.submitted	2018
dc.identifier.uri	https://hdl.handle.net/2027.42/147634
dc.description.abstract	Designing and engineering molecules that have specified properties not only test our understanding of nature but also play an important role in improving both human health and industrial productivity. Two such examples are drug discovery that designs new molecules to treat or even cure diseases and protein engineering that develops useful proteins for medical purposes or catalyzing industrial chemical reactions. However, drug discovery and protein engineering are time-consuming and financially expensive processes because they require multiple rounds of trial-and-error. For instance, developing a new drug costs on average one billion dollars and 10 years of efforts. One effective way to reduce the cost and accelerate the processes is developing computational methods that can rationalize the designing and engineering processes. With both methodological development and increasing amount of computational resource, computational methods for both drug discovery and protein engineering are becoming more and more effective. In this dissertation, I have made new advances on computational methods for drug discovery and protein engineering. Protein-ligand docking and free energy calculation are two widely used computational methods in drug discovery. In the dissertation, I first developed an accelerated version of the protein-ligand docking method, CDOCKER, by introducing two new features — fast Fourier transform based docking and parallel simulated annealing, both of which utilize the parallel computing power of graphical process units. The two new features not only accelerate CDOCKER by at least one order of magnitude but also provide an approach to calculate an upper bound of a scoring function’s docking accuracy which will be useful to optimize the scoring function used in CDOCKER. Then I introduced two new methods for protein-ligand binding free energy calculation: Gibbs sampler lambda-dynamics (GSLD) and Rao-Blackwell estimators (RBE). Compared with the original lambda-dynamics, GSLD is more flexible and easier to implement, and retains the capacity to calculate free energies for multiple ligands simultaneously in a single simulation. Compared with the empirical estimator used in lambda-dynamics, RBE has the advantages that RBE is an unbiased estimator that does not depend on ad hoc cutoff values that are used in the empirical estimators and RBE also has smaller variance than the empirical estimators. Finally, for protein engineering, I investigated how variational auto-encoder models can be useful by inferring information regarding protein stability, evolution, and fitness landscapes using protein sequences. Variational auto-encoder models are probabilistic generative models that embed discrete information in a lower dimensional continuous latent space. Utilizing a protein family's multiple sequence alignment as training data, variational auto-encoder models learn a probability distribution of sequences for the protein family. The probability distribution may then by employed to predict protein stability change upon mutation. The embedding of sequences in a low dimensional latent space not only provides a way to visualize a protein family's sequence space, but also captures evolutionary relationships between sequences. Together with experimental fitness data for the protein sequences, the embedding enables the visualization and expression of the protein's fitness landscape in a low dimensional continuous space. With the amount of protein sequence data keeps increasing rapidly due to advances in sequencing technology, these features of variational auto-encoder models are useful for both studying proteins and guiding protein engineering efforts.
dc.language.iso	en_US
dc.subject	drug discovery
dc.subject	protein engineering
dc.subject	protein-ligand docking
dc.subject	free energy calculation
dc.subject	machine learning
dc.subject	statistics
dc.title	Methodological Advances for Drug Discovery and Protein Engineering
dc.type	Thesis	en_US
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Bioinformatics
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Brooks III, Charles L
dc.contributor.committeemember	Frank, Aaron Terrence
dc.contributor.committeemember	Carlson, Heather A
dc.contributor.committeemember	Freddolino, Peter Louis
dc.contributor.committeemember	Lin, Nina
dc.subject.hlbsecondlevel	Biomedical Engineering
dc.subject.hlbsecondlevel	Chemical Engineering
dc.subject.hlbsecondlevel	Pharmacy and Pharmacology
dc.subject.hlbsecondlevel	Chemistry
dc.subject.hlbsecondlevel	Ecology and Evolutionary Biology
dc.subject.hlbsecondlevel	Physics
dc.subject.hlbsecondlevel	Statistics and Numeric Data
dc.subject.hlbtoplevel	Engineering
dc.subject.hlbtoplevel	Health Sciences
dc.subject.hlbtoplevel	Science
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/147634/1/xqding_1.pdf
dc.identifier.orcid	0000-0002-4598-8732
dc.identifier.name-orcid	Ding, Xinqiang; 0000-0002-4598-8732	en_US
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: xqding_1.pdf
Size:: 3.865MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.