Show simple item record

Protein structure prediction using deep learning distance and hydrogen‐bonding restraints in CASP14

dc.contributor.authorZheng, Wei
dc.contributor.authorLi, Yang
dc.contributor.authorZhang, Chengxin
dc.contributor.authorZhou, Xiaogen
dc.contributor.authorPearce, Robin
dc.contributor.authorBell, Eric W.
dc.contributor.authorHuang, Xiaoqiang
dc.contributor.authorZhang, Yang
dc.date.accessioned2021-12-02T02:28:49Z
dc.date.available2023-01-01 21:28:45en
dc.date.available2021-12-02T02:28:49Z
dc.date.issued2021-12
dc.identifier.citationZheng, Wei; Li, Yang; Zhang, Chengxin; Zhou, Xiaogen; Pearce, Robin; Bell, Eric W.; Huang, Xiaoqiang; Zhang, Yang (2021). "Protein structure prediction using deep learning distance and hydrogen‐bonding restraints in CASP14." Proteins: Structure, Function, and Bioinformatics 89(12): 1734-1751.
dc.identifier.issn0887-3585
dc.identifier.issn1097-0134
dc.identifier.urihttps://hdl.handle.net/2027.42/170963
dc.description.abstractIn this article, we report 3D structure prediction results by two of our best server groups (“Zhang‐Server” and “QUARK”) in CASP14. These two servers were built based on the D‐I‐TASSER and D‐QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I‐TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact‐based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network‐based method, DeepPotential, to predict multiple spatial restraints by co‐evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM‐scores of the first models produced by D‐I‐TASSER and D‐QUARK were 96% and 112% higher than those constructed by I‐TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well‐tuned force field that combines spatial restraints, threading templates, and generic knowledge‐based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi‐domain proteins due to low accuracy in inter‐domain distance prediction and modeling protein domains from oligomer complexes, as the co‐evolutionary analysis cannot distinguish inter‐chain and intra‐chain distances. Specifically tuning the deep learning‐based predictors for multi‐domain targets and protein complexes may be helpful to address these issues.
dc.publisherJohn Wiley & Sons, Inc.
dc.subject.otherprotein structure prediction
dc.subject.otherresidue‐residue distance prediction
dc.subject.otherab initio folding
dc.subject.otherCASP14
dc.subject.otherdeep learning
dc.subject.otherdomain partition
dc.subject.othermultiple sequence alignment
dc.titleProtein structure prediction using deep learning distance and hydrogen‐bonding restraints in CASP14
dc.typeArticle
dc.rights.robotsIndexNoFollow
dc.subject.hlbsecondlevelBiological Chemistry
dc.subject.hlbtoplevelScience
dc.description.peerreviewedPeer Reviewed
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/170963/1/prot26193.pdf
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/170963/2/prot26193_am.pdf
dc.identifier.doi10.1002/prot.26193
dc.identifier.sourceProteins: Structure, Function, and Bioinformatics
dc.identifier.citedreferenceMa J, Wang S, Wang Z, Xu J. MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput Biol. 2014; 10 ( 3 ): e1003500.
dc.identifier.citedreferencePotter SC, Luciani A, Eddy SR, Park Y, Lopez R, Finn RD. HMMER web server: 2018 update. Nucleic Acids Res. 2018; 46 ( W1 ): W200 ‐ W204.
dc.identifier.citedreferenceMirdita M, von den Driesch L, Galiez C, Martin MJ, Söding J, Steinegger M. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 2017; 45 ( D1 ): D170 ‐ D176.
dc.identifier.citedreferenceSuzek BE, Wang Y, Huang H, McGarvey PB, Wu CH, UniProt C. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics (Oxford, England). 2015; 31 ( 6 ): 926 ‐ 932.
dc.identifier.citedreferenceSteinegger M, Söding J. Clustering huge protein sequence sets in linear time. Nat Commun. 2018; 9 ( 1 ): 2542.
dc.identifier.citedreferenceSteinegger M, Mirdita M, Söding J. Protein‐level assembly increases protein sequence recovery from metagenomic samples manyfold. Nat Methods. 2019; 16 ( 7 ): 603 ‐ 606.
dc.identifier.citedreferenceMitchell AL, Almeida A, Beracochea M, et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 2020; 48 ( D1 ): D570 ‐ D578.
dc.identifier.citedreferenceChen IMA, Chu K, Palaniappan K, et al. IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res. 2019; 47 ( D1 ): D666 ‐ D677.
dc.identifier.citedreferenceLi Y, Zhang C, Bell EW, Yu D‐J, Zhang Y. Ensembling multiple raw coevolutionary features with deep residual neural networks for contact‐map prediction in CASP13. Proteins. 2019; 87 ( 12 ): 1082 ‐ 1091.
dc.identifier.citedreferenceEkeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E. 2013; 87 ( 1 ): 012707.
dc.identifier.citedreferenceHe K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. Las Vegas, NV: IEEE; 2016.
dc.identifier.citedreferenceZheng W, Zhang C, Wuyun Q, Pearce R, Li Y, Zhang Y. LOMETS2: improved meta‐threading server for fold‐recognition and structure‐based function annotation for distant‐homology proteins. Nucleic Acids Res. 2019; 47 ( W1 ): W429 ‐ W436.
dc.identifier.citedreferenceXu D, Jaroszewski L, Li Z, Godzik A. FFAS‐3D: improving fold recognition by including optimized structural features and template re‐ranking. Bioinformatics. 2014; 30 ( 5 ): 660 ‐ 667.
dc.identifier.citedreferenceSöding J. Protein homology detection by HMM–HMM comparison. Bioinformatics. 2005; 21 ( 7 ): 951 ‐ 960.
dc.identifier.citedreferenceWu S, Zhang Y. MUSTER: improving protein sequence profile–profile alignments by using multiple sources of structure information. Proteins. 2008; 72 ( 2 ): 547 ‐ 556.
dc.identifier.citedreferenceZhou H, Zhou Y. Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments. Proteins. 2005; 58 ( 2 ): 321 ‐ 328.
dc.identifier.citedreferenceMeier A, Söding J. Automatic prediction of protein 3D structures by probabilistic multi‐template homology modeling. PLoS Comput Biol. 2015; 11 ( 10 ): e1004343.
dc.identifier.citedreferenceBhattacharya S, Roche R, Bhattacharya D. DisCovER: distance‐based covariational threading for weakly homologous proteins. bioRxiv. 2020; 2020.2001.2031.923409.
dc.identifier.citedreferenceZheng W, Wuyun Q, Li Y, et al. Detecting distant‐homology protein structures by aligning deep neural‐network based contact maps. PLoS Comput Biol. 2019; 15 ( 10 ): e1007411.
dc.identifier.citedreferenceBuchan DWA, Jones DT. EigenTHREADER: analogous protein fold recognition by efficient contact map threading. Bioinformatics. 2017; 33 ( 17 ): 2684 ‐ 2690.
dc.identifier.citedreferenceOvchinnikov S, Park H, Varghese N, et al. Protein structure determination using metagenome sequence data. Science. 2017; 355 ( 6322 ): 294 ‐ 298.
dc.identifier.citedreferenceYang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D. Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci. 2020; 117 ( 3 ): 1496 ‐ 1503.
dc.identifier.citedreferenceXu J, Zhang Y. How significant is a protein structure similarity with TM‐score = 0.5? Bioinformatics. 2010; 26 ( 7 ): 889 ‐ 895.
dc.identifier.citedreferenceZhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins. 2004; 57 ( 4 ): 702 ‐ 710.
dc.identifier.citedreferenceZhang J, Zhang Y. A novel side‐chain orientation dependent potential derived from random‐walk reference state for protein fold selection and structure prediction. PLoS One. 2010; 5 ( 10 ): e15386.
dc.identifier.citedreferencePark J, Saitou K. ROTAS: a rotamer‐dependent, atomic statistical potential for assessment and prediction of protein structures. BMC Bioinformatics. 2014; 15 ( 1 ): 307.
dc.identifier.citedreferenceYang J, Wang Y, Zhang Y. ResQ: an approach to unified estimation of B‐factor and residue‐specific error in protein structure prediction. J Mol Biol. 2016; 428 ( 4 ): 693 ‐ 701.
dc.identifier.citedreferenceYang Li CZ, Zheng W, Zhou X, Bell EW, Yu DJ, Zhang Y. Protein inter‐residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14 Proteins. 2021; 89 ( 12 ): 1911 – 1921. https://doi.org/10.1002/prot.26211
dc.identifier.citedreferenceTai C‐H, Lee W‐J, Vincent JJ, Lee B. Evaluation of domain prediction in CASP6. Proteins. 2005; 61 ( S7 ): 183 ‐ 192.
dc.identifier.citedreferenceYang P, Zheng W, Ning K, Zhang Y. Decoding microbiome and protein family linkage to improve protein structure prediction. bioRxiv. 2021; 2021.2004.2015.440088.
dc.identifier.citedreferenceKryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)—round XIII. Proteins. 2019; 87 ( 12 ): 1011 ‐ 1020.
dc.identifier.citedreferenceMoult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—round XII. Proteins. 2018; 86 ( S1 ): 7 ‐ 15.
dc.identifier.citedreferenceMoult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: progress and new directions in round XI. Proteins. 2016; 84 ( S1 ): 4 ‐ 14.
dc.identifier.citedreferenceMoult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—round x. Proteins. 2014; 82 ( S2 ): 1 ‐ 6.
dc.identifier.citedreferenceRoy A, Kucukural A, Zhang Y. I‐TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 2010; 5 ( 4 ): 725 ‐ 738.
dc.identifier.citedreferenceYang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y. The I‐TASSER suite: protein structure and function prediction. Nat Methods. 2015; 12 ( 1 ): 7 ‐ 8.
dc.identifier.citedreferenceYang J, Zhang Y. I‐TASSER server: new development for protein structure and function predictions. Nucleic Acids Res. 2015; 43 ( W1 ): W174 ‐ W181.
dc.identifier.citedreferenceXu D, Zhang Y. Toward optimal fragment generations for ab initio protein structure assembly. Proteins. 2013; 81 ( 2 ): 229 ‐ 239.
dc.identifier.citedreferenceXu D, Zhang Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field. Proteins. 2012; 80 ( 7 ): 1715 ‐ 1735.
dc.identifier.citedreferenceZheng W, Li Y, Zhang C, Pearce R, Mortuza SM, Zhang Y. Deep‐learning contact‐map guided protein structure prediction in CASP13. Proteins. 2019; 87 ( 12 ): 1149 ‐ 1164.
dc.identifier.citedreferenceLi Y, Hu J, Zhang C, Yu D‐J, Zhang Y. ResPRE: high‐accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics. 2019; 35 ( 22 ): 4647 ‐ 4655.
dc.identifier.citedreferenceLi Y, Zhang C, Bell EW, et al. Deducing high‐accuracy protein contact‐maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput Biol. 2021; 17 ( 3 ): e1008865.
dc.identifier.citedreferenceHe B, Mortuza SM, Wang Y, Shen H‐B, Zhang Y. NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics. 2017; 33 ( 15 ): 2296 ‐ 2306.
dc.identifier.citedreferenceMortuza SM, Zheng W, Zhang C, Li Y, Pearce R, Zhang Y. Improving fragment‐based ab initio protein structure assembly using low‐accuracy contact‐map predictions. Nature Communications. Forthcoming 2021.
dc.identifier.citedreferenceZheng W, Zhang C, Li Y, Pearce R, Bell EW, Zhang Y. Folding non‐homologous proteins by coupling deep‐learning contact maps with I‐TASSER assembly simulations. Cell Rep Method. 2021; 1 ( 3 ): 100014. http://dx.doi.org/10.1016/j.crmeth.2021.100014
dc.identifier.citedreferenceZhang C, Zheng W, Mortuza SM, Li Y, Zhang Y. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold‐recognition for distant‐homology proteins. Bioinformatics. 2020; 36 ( 7 ): 2105 ‐ 2112.
dc.identifier.citedreferenceZheng W, Zhou X, Wuyun Q, Pearce R, Li Y, Zhang Y. FUpred: detecting protein domains through deep‐learning‐based contact map prediction. Bioinformatics. 2020; 36 ( 12 ): 3749 ‐ 3757.
dc.identifier.citedreferenceXue Z, Xu D, Wang Y, Zhang Y. ThreaDom: extracting protein domain boundary information from multiple threading alignments. Bioinformatics. 2013; 29 ( 13 ): i247 ‐ i256.
dc.identifier.citedreferenceZhang Y. Interplay of I‐TASSER and QUARK for template‐based and ab initio protein structure prediction in CASP10. Proteins. 2014; 82 ( S2 ): 175 ‐ 187.
dc.identifier.citedreferenceZhang Y, Skolnick J. SPICKER: a clustering approach to identify near‐native protein folds. J Comput Chem. 2004; 25 ( 6 ): 865 ‐ 871.
dc.identifier.citedreferenceZhang J, Liang Y, Zhang Y. Atomic‐level protein structure refinement using fragment‐guided molecular dynamics conformation sampling. Structure. 2011; 19 ( 12 ): 1784 ‐ 1795.
dc.identifier.citedreferenceXu D, Zhang Y. Improving the physical realism and structural accuracy of protein models by a two‐step atomic‐level energy minimization. Biophys J. 2011; 101 ( 10 ): 2525 ‐ 2534.
dc.identifier.citedreferenceHuang X, Pearce R, Zhang Y. FASPR: an open‐source tool for fast and accurate protein side‐chain packing. Bioinformatics. 2020; 36 ( 12 ): 3758 ‐ 3765.
dc.identifier.citedreferenceZhou X, Hu J, Zhang C, Zhang G, Zhang Y. Assembling multidomain protein structures through analogous global structural alignments. Proc Natl Acad Sci. 2019; 116 ( 32 ): 15930 ‐ 15938.
dc.identifier.citedreferenceRemmert M, Biegert A, Hauser A, Söding J. HHblits: lightning‐fast iterative protein sequence searching by HMM‐HMM alignment. Nat Methods. 2012; 9 ( 2 ): 173 ‐ 175.
dc.working.doiNOen
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.