Show simple item record

Ensembling multiple raw coevolutionary features with deep residual neural networks for contactâ map prediction in CASP13

dc.contributor.authorLi, Yang
dc.contributor.authorZhang, Chengxin
dc.contributor.authorBell, Eric W.
dc.contributor.authorYu, Dong‐jun
dc.contributor.authorZhang, Yang
dc.date.accessioned2020-01-13T15:16:41Z
dc.date.availableWITHHELD_12_MONTHS
dc.date.available2020-01-13T15:16:41Z
dc.date.issued2019-12
dc.identifier.citationLi, Yang; Zhang, Chengxin; Bell, Eric W.; Yu, Dong‐jun ; Zhang, Yang (2019). "Ensembling multiple raw coevolutionary features with deep residual neural networks for contactâ map prediction in CASP13." Proteins: Structure, Function, and Bioinformatics 87(12): 1082-1091.
dc.identifier.issn0887-3585
dc.identifier.issn1097-0134
dc.identifier.urihttps://hdl.handle.net/2027.42/153065
dc.description.abstractWe report the results of residueâ residue contact prediction of a new pipeline built purely on the learning of coevolutionary features in the CASP13 experiment. For a query sequence, the pipeline starts with the collection of multiple sequence alignments (MSAs) from multiple genome and metagenome sequence databases using two complementary Hidden Markov Model (HMM)â based searching tools. Three profile matrices, built on covariance, precision, and pseudolikelihood maximization respectively, are then created from the MSAs, which are used as the input features of a deep residual convolutional neural network architecture for contactâ map training and prediction. Two ensembling strategies have been proposed to integrate the matrix features through endâ toâ end training and stacking, resulting in two complementary programs called TripletRes and ResTriplet, respectively. For the 31 freeâ modeling domains that do not have homologous templates in the PDB, TripletRes and ResTriplet generated comparable results with an average accuracy of 0.640 and 0.646, respectively, for the top L/5 longâ range predictions, where 71% and 74% of the cases have an accuracy above 0.5. Detailed data analyses showed that the strength of the pipeline is due to the sensitive MSA construction and the advanced strategies for coevolutionary feature ensembling. Domain splitting was also found to help enhance the contact prediction performance. Nevertheless, contact models for tail regions, which often involve a high number of alignment gaps, and for targets with few homologous sequences are still suboptimal. Development of new approaches where the model is specifically trained on these regions and targets might help address these problems.
dc.publisherJohn Wiley & Sons, Inc.
dc.subject.otherCASP
dc.subject.otherdeep learning
dc.subject.othercontactâ map prediction
dc.subject.othercoevolution analysis
dc.subject.otherprotein folding
dc.titleEnsembling multiple raw coevolutionary features with deep residual neural networks for contactâ map prediction in CASP13
dc.typeArticle
dc.rights.robotsIndexNoFollow
dc.subject.hlbsecondlevelBiological Chemistry
dc.subject.hlbtoplevelScience
dc.description.peerreviewedPeer Reviewed
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/153065/1/prot25798_am.pdf
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/153065/2/prot25798-sup-0001-Supinfo.pdf
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/153065/3/prot25798.pdf
dc.identifier.doi10.1002/prot.25798
dc.identifier.sourceProteins: Structure, Function, and Bioinformatics
dc.identifier.citedreferenceEddy SR. Accelerated profile HMM searches. PLoS Comput Biol. 2011; 7 ( 10 ): e1002195.
dc.identifier.citedreferenceAdhikari B, Hou J, Cheng J. DNCON2: Improved protein contact prediction using twoâ level deep convolutional neural networks. Bioinformatics. 2017; 34 ( 9 ): 1466 â 1472.
dc.identifier.citedreferenceLiu Y, Palmedo P, Ye Q, Berger B, Peng J. Enhancing evolutionary couplings with deep convolutional neural networks. Cell Syst. 2018; 6 ( 1 ): 65 â 74. e63.
dc.identifier.citedreferenceJones DT, Kandathil SM. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics. 2018; 34 ( 19 ): 3308 â 3315.
dc.identifier.citedreferenceLi Y, Hu J, Zhang C, Yu Dâ J, Zhang Y. ResPRE: highâ accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics. 2019; https://doi.org/10.1093/bioinformatics/btz291.
dc.identifier.citedreferenceGöbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins. 1994; 18 ( 4 ): 309 â 317.
dc.identifier.citedreferenceShindyalov I, Kolchanov N, Sander C. Can threeâ dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng Des Sel. 1994; 7 ( 3 ): 349 â 358.
dc.identifier.citedreferenceHe K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016; pp. 770 â 778.
dc.identifier.citedreferenceEkeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E. 2013; 87 ( 1 ): 012707.
dc.identifier.citedreferenceEkeberg M, Hartonen T, Aurell E. Fast pseudolikelihood maximization for directâ coupling analysis of protein structure from many homologous aminoâ acid sequences. J Comput Phys. 2014; 276: 341 â 356.
dc.identifier.citedreferenceZhang C, Zheng W, Mortuza S, Li Y, Zhang Y. DeepMSA: Constructing deep multiple sequence alignment to improve contact prediction and foldâ recognition for distantâ homology proteins. 2019: under review.
dc.identifier.citedreferenceRemmert M, Biegert A, Hauser A, Söding J. HHblits: lightningâ fast iterative protein sequence searching by HMMâ HMM alignment. Nat Methods. 2012; 9 ( 2 ): 173 â 175.
dc.identifier.citedreferenceMirdita M, von den Driesch L, Galiez C, Martin MJ, Söding J, Steinegger M. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 2016; 45 ( D1 ): D170 â D176.
dc.identifier.citedreferenceSteinegger M, Meier M, Mirdita M, Voehringer H, Haunsberger SJ, Soeding J. HHâ suite3 for fast remote homology detection and deep protein annotation. bioRxiv. 2019: 560029.
dc.identifier.citedreferenceRemmert M, Biegert A, Hauser A, Soding J. HHblits: lightningâ fast iterative protein sequence searching by HMMâ HMM alignment. Nat Methods. 2011; 9 ( 2 ): 173 â 175.
dc.identifier.citedreferenceSteinegger M, Soding J. Clustering huge protein sequence sets in linear time. Nat Commun. 2018; 9 ( 1 ): 2542.
dc.identifier.citedreferenceNair V, Hinton GE. Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICMLâ 10). 2010; pp. 807 â 814.
dc.identifier.citedreferenceMcGuffin LJ, Bryson K, Jones DT. The PSIPRED protein structure prediction server. Bioinformatics. 2000; 16 ( 4 ): 404 â 405.
dc.identifier.citedreferenceYu F, Koltun V. Multiâ scale context aggregation by dilated convolutions. arXiv preprint. 2015; arXiv:151107122.
dc.identifier.citedreferencePaszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in pytorch. In: NIPS Autodiff Workshop. 2017.
dc.identifier.citedreferenceKingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint. 2014; arXiv:14126980.
dc.identifier.citedreferenceXue Z, Xu D, Wang Y, Zhang Y. ThreaDom: extracting protein domain boundary information from multiple threading alignments. Bioinformatics. 2013; 29 ( 13 ): i247 â i256.
dc.identifier.citedreferenceWu S, Zhang Y. LOMETS: a local metaâ threadingâ server for protein structure prediction. Nucleic Acids Res. 2007; 35 ( 10 ): 3375 â 3382.
dc.identifier.citedreferenceTowns J, Cockerill T, Dahan M, et al. XSEDE: accelerating scientific discovery. Comput Sci Eng. 2014; 16 ( 5 ): 62 â 74.
dc.identifier.citedreferenceBrowne WJ, North AC, Phillips DC, Brew K, Vanaman TC, Hill RL. A possible threeâ dimensional structure of bovine alphaâ lactalbumin based on that of hen’s eggâ white lysozyme. J Mol Biol. 1969; 42 ( 1 ): 65 â 86.
dc.identifier.citedreferenceLevitt M, Warshel A. Computerâ Simulation of Protein Folding. Nature. 1975; 253 ( 5494 ): 694 â 698.
dc.identifier.citedreferenceSali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol. 1993; 234 ( 3 ): 779 â 815.
dc.identifier.citedreferenceWu S, Szilagyi A, Zhang Y. Improving protein structure prediction using multiple sequenceâ based contact predictions. Structure. 2011; 19 ( 8 ): 1182 â 1191.
dc.identifier.citedreferenceOvchinnikov S, Kim DE, Wang RY, Liu Y, DiMaio F, Baker D. Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta. Proteins. 2016; 84 ( Suppl 1 ): 67 â 75.
dc.identifier.citedreferenceOvchinnikov S, Park H, Varghese N, et al. Protein structure determination using metagenome sequence data. Science. 2017; 355 ( 6322 ): 294 â 298.
dc.identifier.citedreferenceZhang C, Mortuza SM, He B, Wang Y, Zhang Y. Templateâ based and free modeling of Iâ TASSER and QUARK pipelines using predicted contact maps in CASP12. Proteins. 2018; 86 ( Suppl 1 ): 136 â 151.
dc.identifier.citedreferenceKinch LN, Li W, Monastyrskyy B, Kryshtafovych A, Grishin NV. Evaluation of free modeling targets in CASP11 and ROLL. Proteins. 2016; 84 ( Suppl 1 ): 51 â 66.
dc.identifier.citedreferenceAbriata LA, Tamo GE, Monastyrskyy B, Kryshtafovych A, Dal Peraro M. Assessment of hard target modeling in CASP12 reveals an emerging role of alignmentâ based contact prediction methods. Proteins. 2018; 86 Suppl 1: 97 â 112.
dc.identifier.citedreferenceMorcos F, Pagnani A, Lunt B, et al. Directâ coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci. 2011; 108 ( 49 ): E1293 â E1301.
dc.identifier.citedreferenceJones DT, Buchan DW, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2011; 28 ( 2 ): 184 â 190.
dc.identifier.citedreferenceSeemayer S, Gruber M, Söding J. CCMpredâ fast and precise prediction of protein residueâ residue contacts from correlated mutations. Bioinformatics. 2014; 30 ( 21 ): 3128 â 3130.
dc.identifier.citedreferenceKamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolutionâ based residueâ residue contact predictions in a sequenceâ and structureâ rich era. Proc Natl Acad Sci. 2013; 110 ( 39 ): 15674 â 15679.
dc.identifier.citedreferenceJones DT, Singh T, Kosciolek T, Tetchner S. MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics. 2014; 31 ( 7 ): 999 â 1006.
dc.identifier.citedreferenceBuchan DW, Jones DT. Improved protein contact predictions with the MetaPSICOV2 server in CASP12. Proteins. 2018; 86: 78 â 83.
dc.identifier.citedreferenceHe B, Mortuza S, Wang Y, Shen Hâ B, Zhang Y. NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics. 2017; 33 ( 15 ): 2296 â 2306.
dc.identifier.citedreferenceWang S, Sun S, Li Z, Zhang R, Xu J. Accurate de novo prediction of protein contact map by ultraâ deep learning model. PLoS Comput Biol. 2017; 13 ( 1 ): e1005324.
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.