Show simple item record

Methodological Advances for Drug Discovery and Protein Engineering

dc.contributor.authorDing, Xinqiang
dc.date.accessioned2019-02-07T17:55:30Z
dc.date.availableNO_RESTRICTION
dc.date.available2019-02-07T17:55:30Z
dc.date.issued2018
dc.date.submitted2018
dc.identifier.urihttps://hdl.handle.net/2027.42/147634
dc.description.abstractDesigning and engineering molecules that have specified properties not only test our understanding of nature but also play an important role in improving both human health and industrial productivity. Two such examples are drug discovery that designs new molecules to treat or even cure diseases and protein engineering that develops useful proteins for medical purposes or catalyzing industrial chemical reactions. However, drug discovery and protein engineering are time-consuming and financially expensive processes because they require multiple rounds of trial-and-error. For instance, developing a new drug costs on average one billion dollars and 10 years of efforts. One effective way to reduce the cost and accelerate the processes is developing computational methods that can rationalize the designing and engineering processes. With both methodological development and increasing amount of computational resource, computational methods for both drug discovery and protein engineering are becoming more and more effective. In this dissertation, I have made new advances on computational methods for drug discovery and protein engineering. Protein-ligand docking and free energy calculation are two widely used computational methods in drug discovery. In the dissertation, I first developed an accelerated version of the protein-ligand docking method, CDOCKER, by introducing two new features — fast Fourier transform based docking and parallel simulated annealing, both of which utilize the parallel computing power of graphical process units. The two new features not only accelerate CDOCKER by at least one order of magnitude but also provide an approach to calculate an upper bound of a scoring function’s docking accuracy which will be useful to optimize the scoring function used in CDOCKER. Then I introduced two new methods for protein-ligand binding free energy calculation: Gibbs sampler lambda-dynamics (GSLD) and Rao-Blackwell estimators (RBE). Compared with the original lambda-dynamics, GSLD is more flexible and easier to implement, and retains the capacity to calculate free energies for multiple ligands simultaneously in a single simulation. Compared with the empirical estimator used in lambda-dynamics, RBE has the advantages that RBE is an unbiased estimator that does not depend on ad hoc cutoff values that are used in the empirical estimators and RBE also has smaller variance than the empirical estimators. Finally, for protein engineering, I investigated how variational auto-encoder models can be useful by inferring information regarding protein stability, evolution, and fitness landscapes using protein sequences. Variational auto-encoder models are probabilistic generative models that embed discrete information in a lower dimensional continuous latent space. Utilizing a protein family's multiple sequence alignment as training data, variational auto-encoder models learn a probability distribution of sequences for the protein family. The probability distribution may then by employed to predict protein stability change upon mutation. The embedding of sequences in a low dimensional latent space not only provides a way to visualize a protein family's sequence space, but also captures evolutionary relationships between sequences. Together with experimental fitness data for the protein sequences, the embedding enables the visualization and expression of the protein's fitness landscape in a low dimensional continuous space. With the amount of protein sequence data keeps increasing rapidly due to advances in sequencing technology, these features of variational auto-encoder models are useful for both studying proteins and guiding protein engineering efforts.
dc.language.isoen_US
dc.subjectdrug discovery
dc.subjectprotein engineering
dc.subjectprotein-ligand docking
dc.subjectfree energy calculation
dc.subjectmachine learning
dc.subjectstatistics
dc.titleMethodological Advances for Drug Discovery and Protein Engineering
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineBioinformatics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberBrooks III, Charles L
dc.contributor.committeememberFrank, Aaron Terrence
dc.contributor.committeememberCarlson, Heather A
dc.contributor.committeememberFreddolino, Peter Louis
dc.contributor.committeememberLin, Nina
dc.subject.hlbsecondlevelBiomedical Engineering
dc.subject.hlbsecondlevelChemical Engineering
dc.subject.hlbsecondlevelPharmacy and Pharmacology
dc.subject.hlbsecondlevelChemistry
dc.subject.hlbsecondlevelEcology and Evolutionary Biology
dc.subject.hlbsecondlevelPhysics
dc.subject.hlbsecondlevelStatistics and Numeric Data
dc.subject.hlbtoplevelEngineering
dc.subject.hlbtoplevelHealth Sciences
dc.subject.hlbtoplevelScience
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/147634/1/xqding_1.pdf
dc.identifier.orcid0000-0002-4598-8732
dc.identifier.name-orcidDing, Xinqiang; 0000-0002-4598-8732en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.