Deep Learning-based Ab Initio Protein Structure Prediction and Structure-based Protein Function Annotation
dc.contributor.author | Zhang, Chengxin | |
dc.date.accessioned | 2021-02-04T16:38:02Z | |
dc.date.available | 2023-01-01 | |
dc.date.available | 2021-02-04T16:38:02Z | |
dc.date.issued | 2020 | |
dc.date.submitted | 2020 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/166121 | |
dc.description.abstract | Predicting protein structure from its sequence (especially in the absence of structure templates) and deduction of biological function from structure remains a significant and unsolved problem. Much progress in ab initio (i.e. template-free) modeling of protein structure in recent years is due to the introduction of deep learning predicted inter-residue contacts and, even more recently, inter-residue distances. We present D-QUARK, an ab initio protein folding algorithm guided by residue-residue distances and orientations predicted by deep learning. The D-QUARK pipeline is distinct from existing protein folding programs in the following aspects. Firstly, for a target sequence, it generates a high quality multiple sequence alignment (MSA) with deep and diverse sequence homolog alignment using the in-house DeepMSA algorithm. Secondly, to generate input features for deep learning prediction of distances and orientations from the MSA, raw coevolution features are extracted in the form of a covariance matrix and pseudo-likelihood maximization parameters, rather than traditional post-process coevolutionary features. Thirdly, the distance and orientation potentials are incorporated into a comprehensive replica-exchange Monte Carlo (REMC) simulation with a uniquely designed flat well potential for ab initio protein folding. The high quality MSA, accurate deep learning prediction, and REMC simulation with carefully designed energy terms all contribute to the high performance of D-QUARK. In terms of the first model TM-score, D-QUARK outperforms our previous ab initio protein folding algorithm by QUARK by 108.8% and two state-of-the-art distance-based structure prediction programs, DMPfold and trRosetta, by 22.9% and 11.4 %, respectively. In a post-CASP experiment, D-QUARK achieves 8.1% higher first model TM-score on CASP13 FM target proteins than AlphaFold. To annotate protein functions, including Gene Ontology (GO) terms, Enzyme Commission (EC) numbers, and ligand binding sites, from a predicted structure model, we developed COFACTOR. COFACTOR combines functional templates identified by structure alignment against the target structure model as well as sequence homologs and protein-protein interaction partners to derive consensus function annotations. COFACTOR was blindly tested in the community-wide CAFA3 function annotation challenge and was ranked among the top groups. The structure and function prediction pipeline developed in this thesis was applied to proteome-wide annotation projects for several model organisms, including human and the JCVI-syn3.0 minimal bacterial genome, where our pipeline reveals previous uncharacterized proteins with important functions. Overall, we showed the impact of deep learning on protein structure and function prediction, and demonstrated its utility for reliable and scalable modeling. | |
dc.language.iso | en_US | |
dc.subject | Protein Structure Prediction | |
dc.subject | Protein Function Annotation | |
dc.subject | Multiple Sequence Alignment (MSA) | |
dc.subject | Human Proteome | |
dc.subject | Deep Learning | |
dc.subject | JCVI-syn3.0 minimal bacterial genome | |
dc.title | Deep Learning-based Ab Initio Protein Structure Prediction and Structure-based Protein Function Annotation | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Bioinformatics | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Omenn, Gilbert S | |
dc.contributor.committeemember | Ohi, Melanie D | |
dc.contributor.committeemember | Carlson, Heather A | |
dc.contributor.committeemember | Freddolino, Peter Louis | |
dc.contributor.committeemember | Guan, Yuanfang | |
dc.contributor.committeemember | Richardson, Rudy J | |
dc.subject.hlbsecondlevel | Molecular, Cellular and Developmental Biology | |
dc.subject.hlbtoplevel | Science | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/166121/1/zcx_1.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/44 | |
dc.identifier.orcid | 0000-0001-7290-1324 | |
dc.identifier.name-orcid | Zhang, Chengxin; 0000-0001-7290-1324 | en_US |
dc.working.doi | 10.7302/44 | en |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.