Show simple item record

Protein inter- residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14

dc.contributor.authorLi, Yang
dc.contributor.authorZhang, Chengxin
dc.contributor.authorZheng, Wei
dc.contributor.authorZhou, Xiaogen
dc.contributor.authorBell, Eric W.
dc.contributor.authorYu, Dong‐Jun
dc.contributor.authorZhang, Yang
dc.date.accessioned2021-12-02T02:28:19Z
dc.date.available2023-01-01 21:28:17en
dc.date.available2021-12-02T02:28:19Z
dc.date.issued2021-12
dc.identifier.citationLi, Yang; Zhang, Chengxin; Zheng, Wei; Zhou, Xiaogen; Bell, Eric W.; Yu, Dong‐Jun ; Zhang, Yang (2021). "Protein inter- residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14." Proteins: Structure, Function, and Bioinformatics 89(12): 1911-1921.
dc.identifier.issn0887-3585
dc.identifier.issn1097-0134
dc.identifier.urihttps://hdl.handle.net/2027.42/170949
dc.description.abstractThis article reports and analyzes the results of protein contact and distance prediction by our methods in the 14th Critical Assessment of techniques for protein Structure Prediction (CASP14). A new deep learning- based contact/distance predictor was employed based on the ensemble of two complementary coevolution features coupling with deep residual networks. We also improved our multiple sequence alignment (MSA) generation protocol with wholesale meta- genome sequence databases. On 22 CASP14 free modeling (FM) targets, the proposed model achieved a top- L/5 long- range precision of 63.8% and a mean distance bin error of 1.494. Based on the predicted distance potentials, 11 out of 22 FM targets and all of the 14 FM/template- based modeling (TBM) targets have correctly predicted folds (TM- score >0.5), suggesting that our approach can provide reliable distance potentials for ab initio protein folding.
dc.publisherJohn Wiley & Sons, Inc.
dc.subject.otherCASP
dc.subject.othercoevolution
dc.subject.othercontact- map prediction
dc.subject.otherdeep learning
dc.subject.otherprotein structure prediction
dc.titleProtein inter- residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14
dc.typeArticle
dc.rights.robotsIndexNoFollow
dc.subject.hlbsecondlevelBiological Chemistry
dc.subject.hlbtoplevelScience
dc.description.peerreviewedPeer Reviewed
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/170949/1/prot26211_am.pdf
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/170949/2/prot26211-sup-0001-supinfo.pdf
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/170949/3/prot26211.pdf
dc.identifier.doi10.1002/prot.26211
dc.identifier.sourceProteins: Structure, Function, and Bioinformatics
dc.identifier.citedreferenceRao R, Meier J, Sercu T, Ovchinnikov S, Rives A. Transformer protein language models are unsupervised structure learners. bioRxiv. 2020.
dc.identifier.citedreferenceSenior AW, Evans R, Jumper J, et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020; 577 ( 7792 ): 706 - 710.
dc.identifier.citedreferenceXu J. Distance- based protein folding powered by deep learning. Proc Natl Acad Sci. 2019; 116 ( 34 ): 16856 - 16865.
dc.identifier.citedreferenceGreener JG, Kandathil SM, Jones DT. Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints. Nat Commun. 2019; 10 ( 1 ): 3977.
dc.identifier.citedreferenceYang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D. Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci. 2020; 117 ( 3 ): 1496.
dc.identifier.citedreferenceYu F, Koltun V, Funkhouser T. Dilated residual networks. Paper presented at: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017.
dc.identifier.citedreferenceBerman HM, Westbrook J, Feng Z, et al. The Protein Data Bank. Nucleic Acids Res. 2000; 28 ( 1 ): 235 - 242.
dc.identifier.citedreferenceFu L, Niu B, Zhu Z, Wu S, Li W. CD- HIT: accelerated for clustering the next- generation sequencing data. Bioinformatics. 2012; 28 ( 23 ): 3150 - 3152.
dc.identifier.citedreferenceSteinegger M, Söding J. Clustering huge protein sequence sets in linear time. Nat Commun. 2018; 9 ( 1 ): 2542.
dc.identifier.citedreferenceSteinegger M, Mirdita M, Söding J. Protein- level assembly increases protein sequence recovery from metagenomic samples manyfold. Nat Methods. 2019; 16 ( 7 ): 603 - 606.
dc.identifier.citedreferenceMitchell AL, Almeida A, Beracochea M, et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 2020; 48 ( D1 ): D570 - D578.
dc.identifier.citedreferenceChen IMA, Chu K, Palaniappan K, et al. The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities. Nucleic Acids Res. 2021; 49 ( D1 ): D751 - D763.
dc.identifier.citedreferenceZhang C, Zheng W, Mortuza SM, Li Y, Zhang Y. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold- recognition for distant- homology proteins. Bioinformatics. 2020; 36 ( 7 ): 2105 - 2112.
dc.identifier.citedreferenceRemmert M, Biegert A, Hauser A, Söding J. HHblits: lightning- fast iterative protein sequence searching by HMM- HMM alignment. Nat Methods. 2012; 9 ( 2 ): 173.
dc.identifier.citedreferenceJohnson LS, Eddy SR, Portugaly E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics. 2010; 11 ( 1 ): 431.
dc.identifier.citedreferenceSteinegger M, Meier M, Mirdita M, Vöhringer H, Haunsberger SJ, Söding J. HH- suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics. 2019; 20 ( 1 ): 473.
dc.identifier.citedreferenceUlyanov D, Vedaldi A, Lempitsky V. Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:160708022. 2016.
dc.identifier.citedreferenceYang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y. The I- TASSER suite: protein structure and function prediction. Nat Methods. 2015; 12 ( 1 ): 7 - 8.
dc.identifier.citedreferenceThrun S. Is learning the n- th thing any easier than learning the first? Paper presented at: Advances in Neural Information Processing Systems; 1996.
dc.identifier.citedreferenceChaudhury S, Lyskov S, Gray JJ. PyRosetta: a script- based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics. 2010; 26 ( 5 ): 689 - 691.
dc.identifier.citedreferenceXu J, Zhang Y. How significant is a protein structure similarity with TM- score = 0.5? Bioinformatics. 2010; 26 ( 7 ): 889 - 895.
dc.identifier.citedreferenceSchneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990; 18 ( 20 ): 6097 - 6100.
dc.identifier.citedreferenceDrobysheva AV, Panafidina SA, Kolesnik MV, et al. Structure and function of virion RNA polymerase of a crAss- like phage. Nature. 2021; 589 ( 7841 ): 306 - 309.
dc.identifier.citedreferenceJumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021. https://doi.org/10.1038/s41586-021-03819-2
dc.identifier.citedreferenceVaswani A, Shazeer N, Parmar N, et al. Attention is all you need. arXiv preprint arXiv:170603762. 2017.
dc.identifier.citedreferenceElnaggar A, Heinzinger M, Dallago C, et al. ProtTrans: towards cracking the language of Life’s code through self- supervised deep learning and high performance computing. bioRxiv. 2020.
dc.identifier.citedreferenceRao R, Liu J, Verkuil R, et al. MSA transformer. bioRxiv. 2021.
dc.identifier.citedreferenceLi Y, Zhang C, Bell EW, Yu D- J, Zhang Y. Ensembling multiple raw coevolutionary features with deep residual neural networks for contact- map prediction in CASP13. Proteins. 2019; 87 ( 12 ): 1082 - 1091.
dc.identifier.citedreferenceZhang Y. Progress and challenges in protein structure prediction. Curr Opin Struct Biol. 2008; 18 ( 3 ): 342 - 348.
dc.identifier.citedreferenceAbriata LA, Tamò GE, Dal PM. A further leap of improvement in tertiary structure prediction in CASP13 prompts new routes for future assessments. Proteins. 2019; 87 ( 12 ): 1100 - 1112.
dc.identifier.citedreferenceAbriata LA, Tamò GE, Monastyrskyy B, Kryshtafovych A, Dal Peraro M. Assessment of hard target modeling in CASP12 reveals an emerging role of alignment- based contact prediction methods. Proteins. 2018; 86 ( S1 ): 97 - 112.
dc.identifier.citedreferenceFitch WM, Markowitz E. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Genet. 1970; 4 ( 5 ): 579 - 593.
dc.identifier.citedreferenceKorber BT, Farber RM, Wolpert DH, Lapedes AS. Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc Natl Acad Sci. 1993; 90 ( 15 ): 7176.
dc.identifier.citedreferenceDunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics. 2008; 24 ( 3 ): 333 - 340.
dc.identifier.citedreferenceGöbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins. 1994; 18 ( 4 ): 309 - 317.
dc.identifier.citedreferenceMorcos F, Pagnani A, Lunt B, et al. Direct- coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci. 2011; 108 ( 49 ): E1293.
dc.identifier.citedreferenceJones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012; 28 ( 2 ): 184 - 190.
dc.identifier.citedreferenceMa J, Wang S, Wang Z, Xu J. Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning. Bioinformatics. 2015; 31 ( 21 ): 3506 - 3513.
dc.identifier.citedreferenceEkeberg M, Hartonen T, Aurell E. Fast pseudolikelihood maximization for direct- coupling analysis of protein structure from many homologous amino- acid sequences. J Comput Phys. 2014; 276: 341 - 356.
dc.identifier.citedreferenceJones DT, Singh T, Kosciolek T, Tetchner S. MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics. 2015; 31 ( 7 ): 999 - 1006.
dc.identifier.citedreferenceHe B, Mortuza SM, Wang Y, Shen H- B, Zhang Y. NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics. 2017; 33 ( 15 ): 2296 - 2306.
dc.identifier.citedreferenceBuchan DWA, Jones DT. Improved protein contact predictions with the MetaPSICOV2 server in CASP12. Proteins. 2018; 86 ( S1 ): 78 - 83.
dc.identifier.citedreferenceWang Z, Xu J. Predicting protein contact map using evolutionary and physical constraints by integer programming. Bioinformatics. 2013; 29 ( 13 ): i266 - i273.
dc.identifier.citedreferenceWang S, Sun S, Li Z, Zhang R, Xu J. Accurate de novo prediction of protein contact map by ultra- deep learning model. PLoS Comput Biol. 2017; 13 ( 1 ): e1005324.
dc.identifier.citedreferenceAdhikari B, Hou J, Cheng J. DNCON2: improved protein contact prediction using two- level deep convolutional neural networks. Bioinformatics. 2017; 34 ( 9 ): 1466 - 1472.
dc.identifier.citedreferenceLiu Y, Palmedo P, Ye Q, Berger B, Peng J. Enhancing evolutionary couplings with deep convolutional neural networks. Cell Syst. 2018; 6 ( 1 ): 65 - 74.e63.
dc.identifier.citedreferenceGolkov V, Skwark MJ, Golkov A, et al. Protein contact prediction from amino acid co- evolution using convolutional networks for graph- valued images. Paper presented at: NIPS; 2016.
dc.identifier.citedreferenceJones DT, Kandathil SM. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics. 2018; 34 ( 19 ): 3308 - 3315.
dc.identifier.citedreferenceLi Y, Zhang C, Bell EW, et al. Deducing high- accuracy protein contact- maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput Biol. 2021; 17 ( 3 ): e1008865.
dc.identifier.citedreferenceLi Y, Hu J, Zhang C, Yu D- J, Zhang Y. ResPRE: high- accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics. 2019; 35 ( 22 ): 4647 - 4655.
dc.identifier.citedreferenceHe K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Paper presented at: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016.
dc.working.doiNOen
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.