Show simple item record

Deep‐learning contact‐map guided protein structure prediction in CASP13

dc.contributor.authorZheng, Wei
dc.contributor.authorLi, Yang
dc.contributor.authorZhang, Chengxin
dc.contributor.authorPearce, Robin
dc.contributor.authorMortuza, S. M.
dc.contributor.authorZhang, Yang
dc.date.accessioned2020-01-13T15:19:50Z
dc.date.availableWITHHELD_12_MONTHS
dc.date.available2020-01-13T15:19:50Z
dc.date.issued2019-12
dc.identifier.citationZheng, Wei; Li, Yang; Zhang, Chengxin; Pearce, Robin; Mortuza, S. M.; Zhang, Yang (2019). "Deep‐learning contact‐map guided protein structure prediction in CASP13." Proteins: Structure, Function, and Bioinformatics 87(12): 1149-1164.
dc.identifier.issn0887-3585
dc.identifier.issn1097-0134
dc.identifier.urihttps://hdl.handle.net/2027.42/153195
dc.description.abstractWe report the results of two fully automated structure prediction pipelines, “Zhang‐Server” and “QUARK”, in CASP13. The pipelines were built upon the C‐I‐TASSER and C‐QUARK programs, which in turn are based on I‐TASSER and QUARK but with three new modules: (a) a novel multiple sequence alignment (MSA) generation protocol to construct deep sequence‐profiles for contact prediction; (b) an improved meta‐method, NeBcon, which combines multiple contact predictors, including ResPRE that predicts contact‐maps by coupling precision‐matrices with deep residual convolutional neural‐networks; and (c) an optimized contact potential to guide structure assembly simulations. For 50 CASP13 FM domains that lacked homologous templates, average TM‐scores of the first models produced by C‐I‐TASSER and C‐QUARK were 28% and 56% higher than those constructed by I‐TASSER and QUARK, respectively. For the first time, contact‐map predictions demonstrated usefulness on TBM domains with close homologous templates, where TM‐scores of C‐I‐TASSER models were significantly higher than those of I‐TASSER models with a P‐value <.05. Detailed data analyses showed that the success of C‐I‐TASSER and C‐QUARK was mainly due to the increased accuracy of deep‐learning‐based contact‐maps, as well as the careful balance between sequence‐based contact restraints, threading templates, and generic knowledge‐based potentials. Nevertheless, challenges still remain for predicting quaternary structure of multi‐domain proteins, due to the difficulties in domain partitioning and domain reassembly. In addition, contact prediction in terminal regions was often unsatisfactory due to the sparsity of MSAs. Development of new contact‐based domain partitioning and assembly methods and training contact models on sparse MSAs may help address these issues.
dc.publisherJohn Wiley & Sons, Inc.
dc.subject.otherdeep multiple sequence alignment
dc.subject.otherdeep convolutional neural networks
dc.subject.othercontact prediction
dc.subject.otherCASP13
dc.subject.otherab initio folding
dc.subject.otherprotein structure prediction
dc.titleDeep‐learning contact‐map guided protein structure prediction in CASP13
dc.typeArticle
dc.rights.robotsIndexNoFollow
dc.subject.hlbsecondlevelBiological Chemistry
dc.subject.hlbtoplevelScience
dc.description.peerreviewedPeer Reviewed
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/153195/1/prot25792.pdf
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/153195/2/prot25792-sup-0001-Supinfo.pdf
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/153195/3/prot25792_am.pdf
dc.identifier.doi10.1002/prot.25792
dc.identifier.sourceProteins: Structure, Function, and Bioinformatics
dc.identifier.citedreferenceZhang Y, Skolnick J. TM‐align: a protein structure alignment algorithm based on the TM‐score. Nucleic Acids Res. 2005; 33 ( 7 ): 2302 ‐ 2309.
dc.identifier.citedreferenceLiu Y, Palmedo P, Ye Q, Berger B, Peng J. Enhancing evolutionary couplings with deep convolutional neural networks. Cell Syst. 2018; 6 ( 1 ): 65 ‐ 74. e63.
dc.identifier.citedreferenceAdhikari B, Hou J, Cheng J. DNCON2: improved protein contact prediction using two‐level deep convolutional neural networks. Bioinformatics. 2017; 34 ( 9 ): 1466 ‐ 1472.
dc.identifier.citedreferenceKandathil SM, Jones DT. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics. 2018; 34 ( 19 ): 3308 ‐ 3315.
dc.identifier.citedreferenceBuchan DWA, Jones DT. Improved protein contact predictions with the MetaPSICOV2 server in CASP12. Proteins. 2018; 86 ( S1 ): 78 ‐ 83.
dc.identifier.citedreferenceSöding J, Gruber M, Seemayer S. CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations. Bioinformatics. 2014; 30 ( 21 ): 3128 ‐ 3130.
dc.identifier.citedreferenceKamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution‐based residue–residue contact predictions in a sequence‐ and structure‐rich era. Proc Natl Acad Sci. 2013; 110 ( 39 ): 15674 ‐ 15679.
dc.identifier.citedreferenceKaján L, Hopf TA, Kalaš M, Marks DS, Rost B. FreeContact: fast and free software for protein contact prediction from residue co‐evolution. BMC Bioinf. 2014; 15 ( 1 ): 85.
dc.identifier.citedreferenceKingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980. 2014.
dc.identifier.citedreferencePaszke A, Gross S, Chintala S, et al. Automatic Differentiation in Pytorch. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA. 2017.
dc.identifier.citedreferenceZhang Y, Kihara D, Skolnick J. Local energy landscape flattening: parallel hyperbolic Monte Carlo sampling of protein folding. Proteins. 2002; 48: 192 ‐ 201.
dc.identifier.citedreferenceZhang Y, Skolnick J. SPICKER: a clustering approach to identify near‐native protein folds. J Comput Chem. 2004; 25 ( 6 ): 865 ‐ 871.
dc.identifier.citedreferenceZhang W, Yang J, He B, et al. Integration of QUARK and I‐TASSER for Ab initio protein structure prediction in CASP11. Proteins. 2016; 84 ( S1 ): 76 ‐ 86.
dc.identifier.citedreferenceZhang J, Zhang Y. A novel side‐chain orientation dependent potential derived from random‐walk reference state for protein fold selection and structure prediction. PLOS One. 2010; 5 ( 10 ): e15386.
dc.identifier.citedreferenceZhou H, Skolnick J. GOAP: a generalized orientation‐dependent, all‐atom statistical potential for protein structure prediction. Biophys J. 2011; 101 ( 8 ): 2043 ‐ 2052.
dc.identifier.citedreferenceShen M‐y, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci. 2006; 15 ( 11 ): 2507 ‐ 2524.
dc.identifier.citedreferencePark J, Saitou K. ROTAS: a rotamer‐dependent, atomic statistical potential for assessment and prediction of protein structures. BMC Bioinf. 2014; 15 ( 1 ): 307.
dc.identifier.citedreferenceYang J, Wang Y, Zhang Y. ResQ: an approach to unified estimation of B‐factor and residue‐specific error in protein structure prediction. J Mol Biol. 2016; 428 ( 4 ): 693 ‐ 701.
dc.identifier.citedreferenceXu J, Zhang Y. How significant is a protein structure similarity with TM‐score = 0.5? Bioinformatics. 2010; 26 ( 7 ): 889 ‐ 895.
dc.identifier.citedreferenceZhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins. 2004; 57 ( 4 ): 702 ‐ 710.
dc.identifier.citedreferenceAnishchenko I, Ovchinnikov S, Kamisetty H, Baker D. Origins of coevolution between residues distant in protein 3D structures. Proc Natl Acad Sci. 2017; 114 ( 34 ): 9122 ‐ 9127.
dc.identifier.citedreferenceKinch LN, Li W, Monastyrskyy B, Kryshtafovych A, Grishin NV. Assessment of CASP11 contact‐assisted predictions. Proteins. 2016; 84 ( S1 ): 164 ‐ 180.
dc.identifier.citedreferenceJones DT. Protein secondary structure prediction based on position‐specific scoring matrices11Edited by G. Von Heijne. J Mol Biol. 1999; 292 ( 2 ): 195 ‐ 202.
dc.identifier.citedreferenceYan R, Xu D, Yang J, Walker S, Zhang Y. A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction. Sci Rep. 2013; 3: 2619.
dc.identifier.citedreferenceTai C‐H, Lee W‐J, Vincent JJ, Lee B. Evaluation of domain prediction in CASP6. Proteins. 2005; 61 ( S7 ): 183 ‐ 192.
dc.identifier.citedreferenceTress M, Cheng J, Baldi P, et al. Assessment of predictions submitted for the CASP7 domain prediction category. Proteins. 2007; 69 ( S8 ): 137 ‐ 151.
dc.identifier.citedreferenceZhang Y, Skolnick J. The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci U S A. 2005; 102 ( 4 ): 1029 ‐ 1034.
dc.identifier.citedreferenceZhang Y. Progress and challenges in protein structure prediction. Curr Opin Struct Biol. 2008; 18 ( 3 ): 342 ‐ 348.
dc.identifier.citedreferenceKryshtafovych A, Monastyrskyy B, Fidelis K, Moult J, Schwede T, Tramontano A. Evaluation of the template‐based modeling in CASP12. Proteins. 2018; 86 ( Suppl 1 ): 321 ‐ 334.
dc.identifier.citedreferenceDunbrack R. Template‐based modeling assessment in CASP11. Paper presented at: 11th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction; April 2014; Riviera Maya, Mexico.
dc.identifier.citedreferenceBowie JU, Luthy R, Eisenberg D. A method to identify protein sequences that fold into a known three‐dimensional structure. Science. 1991; 253: 164 ‐ 170.
dc.identifier.citedreferenceSoding J. Protein homology detection by HMM‐HMM comparison. Bioinformatics. 2005; 21 ( 7 ): 951 ‐ 960.
dc.identifier.citedreferenceWu S, Zhang Y. MUSTER: improving protein sequence profile‐profile alignments by using multiple sources of structure information. Proteins. 2008; 72 ( 2 ): 547 ‐ 556.
dc.identifier.citedreferenceKinch LN, Li W, Monastyrskyy B, Kryshtafovych A, Grishin NV. Evaluation of free modeling targets in CASP11 and ROLL. Proteins. 2016; 84 ( Suppl 1 ): 51 ‐ 66.
dc.identifier.citedreferenceAbriata LA, Tamo GE, Monastyrskyy B, Kryshtafovych A, Dal Peraro M. Assessment of hard target modeling in CASP12 reveals an emerging role of alignment‐based contact prediction methods. Proteins. 2018; 86 ( Suppl 1 ): 97 ‐ 112.
dc.identifier.citedreferenceZhang Y. Template‐based modeling and free modeling by I‐TASSER in CASP7. Proteins. 2007; 69 ( S8 ): 108 ‐ 117.
dc.identifier.citedreferenceRoy A, Kucukural A, Zhang Y. I‐TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 2010; 5: 725 ‐ 738.
dc.identifier.citedreferenceZhang Y. I‐TASSER server for protein 3D structure prediction. BMC Bioinf. 2008; 9 ( 1 ): 40.
dc.identifier.citedreferenceYang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y. The I‐TASSER suite: protein structure and function prediction. Nat Methods. 2015; 12 ( 1 ): 7 ‐ 8.
dc.identifier.citedreferenceXu D, Zhang Y. Toward optimal fragment generations for ab initio protein structure assembly. Proteins. 2013; 81 ( 2 ): 229 ‐ 239.
dc.identifier.citedreferenceXu D, Zhang Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field. Proteins. 2012; 80 ( 7 ): 1715 ‐ 1735.
dc.identifier.citedreferenceXu D, Zhang J, Roy A, Zhang Y. Automated protein structure modeling in CASP9 by I‐TASSER pipeline combined with QUARK‐based ab initio folding and FG‐MD‐based structure refinement. Proteins. 2011; 79 ( Suppl 10 ): 147 ‐ 160.
dc.identifier.citedreferenceHe B, Mortuza SM, Wang Y, Shen H‐B, Zhang Y. NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics. 2017; 33 ( 15 ): 2296 ‐ 2306.
dc.identifier.citedreferenceLi Y, Hu J, Zhang C, Yu D‐J, Zhang Y. ResPRE: high‐accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics. 2019. https://doi.org/10.1093/bioinformatics/btz291
dc.identifier.citedreferenceHe K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Paper presented at: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); June 27‐30, 2016; Las Vegas, NV.
dc.identifier.citedreferenceXu D, Wang Y, Zhang Y, Xue Z. ThreaDom: extracting protein domain boundary information from multiple threading alignments. Bioinformatics. 2013; 29 ( 13 ): i247 ‐ i256.
dc.identifier.citedreferenceRemmert M, Biegert A, Hauser A, Söding J. HHblits: lightning‐fast iterative protein sequence searching by HMM‐HMM alignment. Nat Methods. 2011; 9: 173.
dc.identifier.citedreferenceJohnson LS, Eddy SR, Portugaly E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinf. 2010; 11 ( 1 ): 431.
dc.identifier.citedreferenceWu S, Zhang Y. LOMETS: a local meta‐threading‐server for protein structure prediction. Nucleic Acids Res. 2007; 35 ( 10 ): 3375 ‐ 3382.
dc.identifier.citedreferenceZhang Y. Interplay of I‐TASSER and QUARK for template‐based and ab initio protein structure prediction in CASP10. Proteins. 2014; 82 ( S2 ): 175 ‐ 187.
dc.identifier.citedreferenceZhang J, Liang Y, Zhang Y. Atomic‐level protein structure refinement using fragment‐guided molecular dynamics conformation sampling. Structure. 2011; 19 ( 12 ): 1784 ‐ 1795.
dc.identifier.citedreferenceGaliez C, Mirdita M, Söding J, von den Driesch L, Steinegger M, Martin MJ. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 2016; 45 ( D1 ): D170 ‐ D176.
dc.identifier.citedreferenceSuzek BE, Wu CH, Huang H, McGarvey PB, Wang Y, UniProt Consortium. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics. 2014; 31 ( 6 ): 926 ‐ 932.
dc.identifier.citedreferenceSteinegger M, Söding J. Clustering huge protein sequence sets in linear time. Nat Commun. 2018; 9 ( 1 ): 2542.
dc.identifier.citedreferenceOvchinnikov S, Park H, Varghese N, et al. Protein structure determination using metagenome sequence data. Science. 2017; 355 ( 6322 ): 294 ‐ 298.
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.