Ensembling multiple raw coevolutionary features with deep residual neural networks for contactâ map prediction in CASP13
dc.contributor.author | Li, Yang | |
dc.contributor.author | Zhang, Chengxin | |
dc.contributor.author | Bell, Eric W. | |
dc.contributor.author | Yu, Dong‐jun | |
dc.contributor.author | Zhang, Yang | |
dc.date.accessioned | 2020-01-13T15:16:41Z | |
dc.date.available | WITHHELD_12_MONTHS | |
dc.date.available | 2020-01-13T15:16:41Z | |
dc.date.issued | 2019-12 | |
dc.identifier.citation | Li, Yang; Zhang, Chengxin; Bell, Eric W.; Yu, Dong‐jun ; Zhang, Yang (2019). "Ensembling multiple raw coevolutionary features with deep residual neural networks for contactâ map prediction in CASP13." Proteins: Structure, Function, and Bioinformatics 87(12): 1082-1091. | |
dc.identifier.issn | 0887-3585 | |
dc.identifier.issn | 1097-0134 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/153065 | |
dc.description.abstract | We report the results of residueâ residue contact prediction of a new pipeline built purely on the learning of coevolutionary features in the CASP13 experiment. For a query sequence, the pipeline starts with the collection of multiple sequence alignments (MSAs) from multiple genome and metagenome sequence databases using two complementary Hidden Markov Model (HMM)â based searching tools. Three profile matrices, built on covariance, precision, and pseudolikelihood maximization respectively, are then created from the MSAs, which are used as the input features of a deep residual convolutional neural network architecture for contactâ map training and prediction. Two ensembling strategies have been proposed to integrate the matrix features through endâ toâ end training and stacking, resulting in two complementary programs called TripletRes and ResTriplet, respectively. For the 31 freeâ modeling domains that do not have homologous templates in the PDB, TripletRes and ResTriplet generated comparable results with an average accuracy of 0.640 and 0.646, respectively, for the top L/5 longâ range predictions, where 71% and 74% of the cases have an accuracy above 0.5. Detailed data analyses showed that the strength of the pipeline is due to the sensitive MSA construction and the advanced strategies for coevolutionary feature ensembling. Domain splitting was also found to help enhance the contact prediction performance. Nevertheless, contact models for tail regions, which often involve a high number of alignment gaps, and for targets with few homologous sequences are still suboptimal. Development of new approaches where the model is specifically trained on these regions and targets might help address these problems. | |
dc.publisher | John Wiley & Sons, Inc. | |
dc.subject.other | CASP | |
dc.subject.other | deep learning | |
dc.subject.other | contactâ map prediction | |
dc.subject.other | coevolution analysis | |
dc.subject.other | protein folding | |
dc.title | Ensembling multiple raw coevolutionary features with deep residual neural networks for contactâ map prediction in CASP13 | |
dc.type | Article | |
dc.rights.robots | IndexNoFollow | |
dc.subject.hlbsecondlevel | Biological Chemistry | |
dc.subject.hlbtoplevel | Science | |
dc.description.peerreviewed | Peer Reviewed | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/153065/1/prot25798_am.pdf | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/153065/2/prot25798-sup-0001-Supinfo.pdf | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/153065/3/prot25798.pdf | |
dc.identifier.doi | 10.1002/prot.25798 | |
dc.identifier.source | Proteins: Structure, Function, and Bioinformatics | |
dc.identifier.citedreference | Eddy SR. Accelerated profile HMM searches. PLoS Comput Biol. 2011; 7 ( 10 ): e1002195. | |
dc.identifier.citedreference | Adhikari B, Hou J, Cheng J. DNCON2: Improved protein contact prediction using twoâ level deep convolutional neural networks. Bioinformatics. 2017; 34 ( 9 ): 1466 â 1472. | |
dc.identifier.citedreference | Liu Y, Palmedo P, Ye Q, Berger B, Peng J. Enhancing evolutionary couplings with deep convolutional neural networks. Cell Syst. 2018; 6 ( 1 ): 65 â 74. e63. | |
dc.identifier.citedreference | Jones DT, Kandathil SM. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics. 2018; 34 ( 19 ): 3308 â 3315. | |
dc.identifier.citedreference | Li Y, Hu J, Zhang C, Yu Dâ J, Zhang Y. ResPRE: highâ accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics. 2019; https://doi.org/10.1093/bioinformatics/btz291. | |
dc.identifier.citedreference | Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins. 1994; 18 ( 4 ): 309 â 317. | |
dc.identifier.citedreference | Shindyalov I, Kolchanov N, Sander C. Can threeâ dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng Des Sel. 1994; 7 ( 3 ): 349 â 358. | |
dc.identifier.citedreference | He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016; pp. 770 â 778. | |
dc.identifier.citedreference | Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E. 2013; 87 ( 1 ): 012707. | |
dc.identifier.citedreference | Ekeberg M, Hartonen T, Aurell E. Fast pseudolikelihood maximization for directâ coupling analysis of protein structure from many homologous aminoâ acid sequences. J Comput Phys. 2014; 276: 341 â 356. | |
dc.identifier.citedreference | Zhang C, Zheng W, Mortuza S, Li Y, Zhang Y. DeepMSA: Constructing deep multiple sequence alignment to improve contact prediction and foldâ recognition for distantâ homology proteins. 2019: under review. | |
dc.identifier.citedreference | Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightningâ fast iterative protein sequence searching by HMMâ HMM alignment. Nat Methods. 2012; 9 ( 2 ): 173 â 175. | |
dc.identifier.citedreference | Mirdita M, von den Driesch L, Galiez C, Martin MJ, Söding J, Steinegger M. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 2016; 45 ( D1 ): D170 â D176. | |
dc.identifier.citedreference | Steinegger M, Meier M, Mirdita M, Voehringer H, Haunsberger SJ, Soeding J. HHâ suite3 for fast remote homology detection and deep protein annotation. bioRxiv. 2019: 560029. | |
dc.identifier.citedreference | Remmert M, Biegert A, Hauser A, Soding J. HHblits: lightningâ fast iterative protein sequence searching by HMMâ HMM alignment. Nat Methods. 2011; 9 ( 2 ): 173 â 175. | |
dc.identifier.citedreference | Steinegger M, Soding J. Clustering huge protein sequence sets in linear time. Nat Commun. 2018; 9 ( 1 ): 2542. | |
dc.identifier.citedreference | Nair V, Hinton GE. Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICMLâ 10). 2010; pp. 807 â 814. | |
dc.identifier.citedreference | McGuffin LJ, Bryson K, Jones DT. The PSIPRED protein structure prediction server. Bioinformatics. 2000; 16 ( 4 ): 404 â 405. | |
dc.identifier.citedreference | Yu F, Koltun V. Multiâ scale context aggregation by dilated convolutions. arXiv preprint. 2015; arXiv:151107122. | |
dc.identifier.citedreference | Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in pytorch. In: NIPS Autodiff Workshop. 2017. | |
dc.identifier.citedreference | Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint. 2014; arXiv:14126980. | |
dc.identifier.citedreference | Xue Z, Xu D, Wang Y, Zhang Y. ThreaDom: extracting protein domain boundary information from multiple threading alignments. Bioinformatics. 2013; 29 ( 13 ): i247 â i256. | |
dc.identifier.citedreference | Wu S, Zhang Y. LOMETS: a local metaâ threadingâ server for protein structure prediction. Nucleic Acids Res. 2007; 35 ( 10 ): 3375 â 3382. | |
dc.identifier.citedreference | Towns J, Cockerill T, Dahan M, et al. XSEDE: accelerating scientific discovery. Comput Sci Eng. 2014; 16 ( 5 ): 62 â 74. | |
dc.identifier.citedreference | Browne WJ, North AC, Phillips DC, Brew K, Vanaman TC, Hill RL. A possible threeâ dimensional structure of bovine alphaâ lactalbumin based on that of hen’s eggâ white lysozyme. J Mol Biol. 1969; 42 ( 1 ): 65 â 86. | |
dc.identifier.citedreference | Levitt M, Warshel A. Computerâ Simulation of Protein Folding. Nature. 1975; 253 ( 5494 ): 694 â 698. | |
dc.identifier.citedreference | Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol. 1993; 234 ( 3 ): 779 â 815. | |
dc.identifier.citedreference | Wu S, Szilagyi A, Zhang Y. Improving protein structure prediction using multiple sequenceâ based contact predictions. Structure. 2011; 19 ( 8 ): 1182 â 1191. | |
dc.identifier.citedreference | Ovchinnikov S, Kim DE, Wang RY, Liu Y, DiMaio F, Baker D. Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta. Proteins. 2016; 84 ( Suppl 1 ): 67 â 75. | |
dc.identifier.citedreference | Ovchinnikov S, Park H, Varghese N, et al. Protein structure determination using metagenome sequence data. Science. 2017; 355 ( 6322 ): 294 â 298. | |
dc.identifier.citedreference | Zhang C, Mortuza SM, He B, Wang Y, Zhang Y. Templateâ based and free modeling of Iâ TASSER and QUARK pipelines using predicted contact maps in CASP12. Proteins. 2018; 86 ( Suppl 1 ): 136 â 151. | |
dc.identifier.citedreference | Kinch LN, Li W, Monastyrskyy B, Kryshtafovych A, Grishin NV. Evaluation of free modeling targets in CASP11 and ROLL. Proteins. 2016; 84 ( Suppl 1 ): 51 â 66. | |
dc.identifier.citedreference | Abriata LA, Tamo GE, Monastyrskyy B, Kryshtafovych A, Dal Peraro M. Assessment of hard target modeling in CASP12 reveals an emerging role of alignmentâ based contact prediction methods. Proteins. 2018; 86 Suppl 1: 97 â 112. | |
dc.identifier.citedreference | Morcos F, Pagnani A, Lunt B, et al. Directâ coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci. 2011; 108 ( 49 ): E1293 â E1301. | |
dc.identifier.citedreference | Jones DT, Buchan DW, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2011; 28 ( 2 ): 184 â 190. | |
dc.identifier.citedreference | Seemayer S, Gruber M, Söding J. CCMpredâ fast and precise prediction of protein residueâ residue contacts from correlated mutations. Bioinformatics. 2014; 30 ( 21 ): 3128 â 3130. | |
dc.identifier.citedreference | Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolutionâ based residueâ residue contact predictions in a sequenceâ and structureâ rich era. Proc Natl Acad Sci. 2013; 110 ( 39 ): 15674 â 15679. | |
dc.identifier.citedreference | Jones DT, Singh T, Kosciolek T, Tetchner S. MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics. 2014; 31 ( 7 ): 999 â 1006. | |
dc.identifier.citedreference | Buchan DW, Jones DT. Improved protein contact predictions with the MetaPSICOV2 server in CASP12. Proteins. 2018; 86: 78 â 83. | |
dc.identifier.citedreference | He B, Mortuza S, Wang Y, Shen Hâ B, Zhang Y. NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics. 2017; 33 ( 15 ): 2296 â 2306. | |
dc.identifier.citedreference | Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate de novo prediction of protein contact map by ultraâ deep learning model. PLoS Comput Biol. 2017; 13 ( 1 ): e1005324. | |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.