Ensembling multiple raw coevolutionary features with deep residual neural networks for contactâ  map prediction in CASP13

Li, Yang; Zhang, Chengxin; Bell, Eric W.; Yu, Dong‐jun; Zhang, Yang

Ensembling multiple raw coevolutionary features with deep residual neural networks for contactâ map prediction in CASP13

dc.contributor.author	Li, Yang
dc.contributor.author	Zhang, Chengxin
dc.contributor.author	Bell, Eric W.
dc.contributor.author	Yu, Dong‐jun
dc.contributor.author	Zhang, Yang
dc.date.accessioned	2020-01-13T15:16:41Z
dc.date.available	WITHHELD_12_MONTHS
dc.date.available	2020-01-13T15:16:41Z
dc.date.issued	2019-12
dc.identifier.citation	Li, Yang; Zhang, Chengxin; Bell, Eric W.; Yu, Dong‐jun ; Zhang, Yang (2019). "Ensembling multiple raw coevolutionary features with deep residual neural networks for contactâ map prediction in CASP13." Proteins: Structure, Function, and Bioinformatics 87(12): 1082-1091.
dc.identifier.issn	0887-3585
dc.identifier.issn	1097-0134
dc.identifier.uri	https://hdl.handle.net/2027.42/153065
dc.description.abstract	We report the results of residueâ residue contact prediction of a new pipeline built purely on the learning of coevolutionary features in the CASP13 experiment. For a query sequence, the pipeline starts with the collection of multiple sequence alignments (MSAs) from multiple genome and metagenome sequence databases using two complementary Hidden Markov Model (HMM)â based searching tools. Three profile matrices, built on covariance, precision, and pseudolikelihood maximization respectively, are then created from the MSAs, which are used as the input features of a deep residual convolutional neural network architecture for contactâ map training and prediction. Two ensembling strategies have been proposed to integrate the matrix features through endâ toâ end training and stacking, resulting in two complementary programs called TripletRes and ResTriplet, respectively. For the 31 freeâ modeling domains that do not have homologous templates in the PDB, TripletRes and ResTriplet generated comparable results with an average accuracy of 0.640 and 0.646, respectively, for the top L/5 longâ range predictions, where 71% and 74% of the cases have an accuracy above 0.5. Detailed data analyses showed that the strength of the pipeline is due to the sensitive MSA construction and the advanced strategies for coevolutionary feature ensembling. Domain splitting was also found to help enhance the contact prediction performance. Nevertheless, contact models for tail regions, which often involve a high number of alignment gaps, and for targets with few homologous sequences are still suboptimal. Development of new approaches where the model is specifically trained on these regions and targets might help address these problems.
dc.publisher	John Wiley & Sons, Inc.
dc.subject.other	CASP
dc.subject.other	deep learning
dc.subject.other	contactâ map prediction
dc.subject.other	coevolution analysis
dc.subject.other	protein folding
dc.title	Ensembling multiple raw coevolutionary features with deep residual neural networks for contactâ map prediction in CASP13
dc.type	Article
dc.rights.robots	IndexNoFollow
dc.subject.hlbsecondlevel	Biological Chemistry
dc.subject.hlbtoplevel	Science
dc.description.peerreviewed	Peer Reviewed
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/153065/1/prot25798_am.pdf
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/153065/2/prot25798-sup-0001-Supinfo.pdf
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/153065/3/prot25798.pdf
dc.identifier.doi	10.1002/prot.25798
dc.identifier.source	Proteins: Structure, Function, and Bioinformatics
dc.identifier.citedreference	Eddy SR. Accelerated profile HMM searches. PLoS Comput Biol. 2011; 7 ( 10 ): e1002195.
dc.identifier.citedreference	Adhikari B, Hou J, Cheng J. DNCON2: Improved protein contact prediction using twoâ level deep convolutional neural networks. Bioinformatics. 2017; 34 ( 9 ): 1466 â 1472.
dc.identifier.citedreference	Liu Y, Palmedo P, Ye Q, Berger B, Peng J. Enhancing evolutionary couplings with deep convolutional neural networks. Cell Syst. 2018; 6 ( 1 ): 65 â 74. e63.
dc.identifier.citedreference	Jones DT, Kandathil SM. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics. 2018; 34 ( 19 ): 3308 â 3315.
dc.identifier.citedreference	Li Y, Hu J, Zhang C, Yu Dâ J, Zhang Y. ResPRE: highâ accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics. 2019; https://doi.org/10.1093/bioinformatics/btz291.
dc.identifier.citedreference	GÃ¶bel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins. 1994; 18 ( 4 ): 309 â 317.
dc.identifier.citedreference	Shindyalov I, Kolchanov N, Sander C. Can threeâ dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng Des Sel. 1994; 7 ( 3 ): 349 â 358.
dc.identifier.citedreference	He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016; pp. 770 â 778.
dc.identifier.citedreference	Ekeberg M, LÃ¶vkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E. 2013; 87 ( 1 ): 012707.
dc.identifier.citedreference	Ekeberg M, Hartonen T, Aurell E. Fast pseudolikelihood maximization for directâ coupling analysis of protein structure from many homologous aminoâ acid sequences. J Comput Phys. 2014; 276: 341 â 356.
dc.identifier.citedreference	Zhang C, Zheng W, Mortuza S, Li Y, Zhang Y. DeepMSA: Constructing deep multiple sequence alignment to improve contact prediction and foldâ recognition for distantâ homology proteins. 2019: under review.
dc.identifier.citedreference	Remmert M, Biegert A, Hauser A, SÃ¶ding J. HHblits: lightningâ fast iterative protein sequence searching by HMMâ HMM alignment. Nat Methods. 2012; 9 ( 2 ): 173 â 175.
dc.identifier.citedreference	Mirdita M, von den Driesch L, Galiez C, Martin MJ, SÃ¶ding J, Steinegger M. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 2016; 45 ( D1 ): D170 â D176.
dc.identifier.citedreference	Steinegger M, Meier M, Mirdita M, Voehringer H, Haunsberger SJ, Soeding J. HHâ suite3 for fast remote homology detection and deep protein annotation. bioRxiv. 2019: 560029.
dc.identifier.citedreference	Remmert M, Biegert A, Hauser A, Soding J. HHblits: lightningâ fast iterative protein sequence searching by HMMâ HMM alignment. Nat Methods. 2011; 9 ( 2 ): 173 â 175.
dc.identifier.citedreference	Steinegger M, Soding J. Clustering huge protein sequence sets in linear time. Nat Commun. 2018; 9 ( 1 ): 2542.
dc.identifier.citedreference	Nair V, Hinton GE. Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICMLâ 10). 2010; pp. 807 â 814.
dc.identifier.citedreference	McGuffin LJ, Bryson K, Jones DT. The PSIPRED protein structure prediction server. Bioinformatics. 2000; 16 ( 4 ): 404 â 405.
dc.identifier.citedreference	Yu F, Koltun V. Multiâ scale context aggregation by dilated convolutions. arXiv preprint. 2015; arXiv:151107122.
dc.identifier.citedreference	Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in pytorch. In: NIPS Autodiff Workshop. 2017.
dc.identifier.citedreference	Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint. 2014; arXiv:14126980.
dc.identifier.citedreference	Xue Z, Xu D, Wang Y, Zhang Y. ThreaDom: extracting protein domain boundary information from multiple threading alignments. Bioinformatics. 2013; 29 ( 13 ): i247 â i256.
dc.identifier.citedreference	Wu S, Zhang Y. LOMETS: a local metaâ threadingâ server for protein structure prediction. Nucleic Acids Res. 2007; 35 ( 10 ): 3375 â 3382.
dc.identifier.citedreference	Towns J, Cockerill T, Dahan M, et al. XSEDE: accelerating scientific discovery. Comput Sci Eng. 2014; 16 ( 5 ): 62 â 74.
dc.identifier.citedreference	Browne WJ, North AC, Phillips DC, Brew K, Vanaman TC, Hill RL. A possible threeâ dimensional structure of bovine alphaâ lactalbumin based on that of hen’s eggâ white lysozyme. J Mol Biol. 1969; 42 ( 1 ): 65 â 86.
dc.identifier.citedreference	Levitt M, Warshel A. Computerâ Simulation of Protein Folding. Nature. 1975; 253 ( 5494 ): 694 â 698.
dc.identifier.citedreference	Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol. 1993; 234 ( 3 ): 779 â 815.
dc.identifier.citedreference	Wu S, Szilagyi A, Zhang Y. Improving protein structure prediction using multiple sequenceâ based contact predictions. Structure. 2011; 19 ( 8 ): 1182 â 1191.
dc.identifier.citedreference	Ovchinnikov S, Kim DE, Wang RY, Liu Y, DiMaio F, Baker D. Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta. Proteins. 2016; 84 ( Suppl 1 ): 67 â 75.
dc.identifier.citedreference	Ovchinnikov S, Park H, Varghese N, et al. Protein structure determination using metagenome sequence data. Science. 2017; 355 ( 6322 ): 294 â 298.
dc.identifier.citedreference	Zhang C, Mortuza SM, He B, Wang Y, Zhang Y. Templateâ based and free modeling of Iâ TASSER and QUARK pipelines using predicted contact maps in CASP12. Proteins. 2018; 86 ( Suppl 1 ): 136 â 151.
dc.identifier.citedreference	Kinch LN, Li W, Monastyrskyy B, Kryshtafovych A, Grishin NV. Evaluation of free modeling targets in CASP11 and ROLL. Proteins. 2016; 84 ( Suppl 1 ): 51 â 66.
dc.identifier.citedreference	Abriata LA, Tamo GE, Monastyrskyy B, Kryshtafovych A, Dal Peraro M. Assessment of hard target modeling in CASP12 reveals an emerging role of alignmentâ based contact prediction methods. Proteins. 2018; 86 Suppl 1: 97 â 112.
dc.identifier.citedreference	Morcos F, Pagnani A, Lunt B, et al. Directâ coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci. 2011; 108 ( 49 ): E1293 â E1301.
dc.identifier.citedreference	Jones DT, Buchan DW, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2011; 28 ( 2 ): 184 â 190.
dc.identifier.citedreference	Seemayer S, Gruber M, SÃ¶ding J. CCMpredâ fast and precise prediction of protein residueâ residue contacts from correlated mutations. Bioinformatics. 2014; 30 ( 21 ): 3128 â 3130.
dc.identifier.citedreference	Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolutionâ based residueâ residue contact predictions in a sequenceâ and structureâ rich era. Proc Natl Acad Sci. 2013; 110 ( 39 ): 15674 â 15679.
dc.identifier.citedreference	Jones DT, Singh T, Kosciolek T, Tetchner S. MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics. 2014; 31 ( 7 ): 999 â 1006.
dc.identifier.citedreference	Buchan DW, Jones DT. Improved protein contact predictions with the MetaPSICOV2 server in CASP12. Proteins. 2018; 86: 78 â 83.
dc.identifier.citedreference	He B, Mortuza S, Wang Y, Shen Hâ B, Zhang Y. NeBcon: protein contact map prediction using neural network training coupled with naÃ¯ve Bayes classifiers. Bioinformatics. 2017; 33 ( 15 ): 2296 â 2306.
dc.identifier.citedreference	Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate de novo prediction of protein contact map by ultraâ deep learning model. PLoS Comput Biol. 2017; 13 ( 1 ): e1005324.
dc.owningcollname	Interdisciplinary and Peer-Reviewed

Files in this item

Name:: prot25798_am.pdf
Size:: 1.631MB
Format:: PDF

View/Open

Name:: prot25798-sup-0001-Supinfo.pdf
Size:: 920.6KB
Format:: PDF

View/Open

Name:: prot25798.pdf
Size:: 2.228MB
Format:: PDF

View/Open

Interdisciplinary and Peer-Reviewed

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.