Improving accuracy of protein contact prediction using balanced network deconvolution

Sun, Hai‐ping; Huang, Yan; Wang, Xiao‐fan; Zhang, Yang; Shen, Hong‐bin

Improving accuracy of protein contact prediction using balanced network deconvolution

dc.contributor.author	Sun, Hai‐ping	en_US
dc.contributor.author	Huang, Yan	en_US
dc.contributor.author	Wang, Xiao‐fan	en_US
dc.contributor.author	Zhang, Yang	en_US
dc.contributor.author	Shen, Hong‐bin	en_US
dc.date.accessioned	2015-03-05T18:24:25Z
dc.date.available	2016-05-10T20:26:27Z	en
dc.date.issued	2015-03	en_US
dc.identifier.citation	Sun, Hai‐ping ; Huang, Yan; Wang, Xiao‐fan ; Zhang, Yang; Shen, Hong‐bin (2015). "Improving accuracy of protein contact prediction using balanced network deconvolution." Proteins: Structure, Function, and Bioinformatics 83(3): 485-496.	en_US
dc.identifier.issn	0887-3585	en_US
dc.identifier.issn	1097-0134	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/110720
dc.description.abstract	Residue contact map is essential for protein three‐dimensional structure determination. But most of the current contact prediction methods based on residue co‐evolution suffer from high false‐positives as introduced by indirect and transitive contacts (i.e., residues A–B and B–C are in contact, but A–C are not). Built on the work by Feizi et al. (Nat Biotechnol 2013; 31:726–733), which demonstrated a general network model to distinguish direct dependencies by network deconvolution, this study presents a new balanced network deconvolution (BND) algorithm to identify optimized dependency matrix without limit on the eigenvalue range in the applied network systems. The algorithm was used to filter contact predictions of five widely used co‐evolution methods. On the test of proteins from three benchmark datasets of the 9th critical assessment of protein structure prediction (CASP9), CASP10, and PSICOV (precise structural contact prediction using sparse inverse covariance estimation) database experiments, the BND can improve the medium‐ and long‐range contact predictions at the L/5 cutoff by 55.59% and 47.68%, respectively, without additional central processing unit cost. The improvement is statistically significant, with a P‐value < 5.93 × 10−3 in the Student's t‐test. A further comparison with the ab initio structure predictions in CASPs showed that the usefulness of the current co‐evolution‐based contact prediction to the three‐dimensional structure modeling relies on the number of homologous sequences existing in the sequence databases. BND can be used as a general contact refinement method, which is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/BND/. Proteins 2015; 83:485–496. © 2014 Wiley Periodicals, Inc.	en_US
dc.publisher	Wiley Periodicals, Inc.	en_US
dc.subject.other	predictor	en_US
dc.subject.other	protein structure prediction	en_US
dc.subject.other	residue contact map	en_US
dc.subject.other	residue co‐evolution	en_US
dc.subject.other	transitive noise	en_US
dc.subject.other	filter	en_US
dc.title	Improving accuracy of protein contact prediction using balanced network deconvolution	en_US
dc.type	Article	en_US
dc.rights.robots	IndexNoFollow	en_US
dc.subject.hlbsecondlevel	Biological Chemistry	en_US
dc.subject.hlbtoplevel	Science	en_US
dc.description.peerreviewed	Peer Reviewed	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/110720/1/prot24744.pdf
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/110720/2/prot24744-sup-0001-suppinfo.pdf
dc.identifier.doi	10.1002/prot.24744	en_US
dc.identifier.source	Proteins: Structure, Function, and Bioinformatics	en_US
dc.identifier.citedreference	Kinch L, Yong Shi S, Cong Q, Cheng H, Liao Y, Grishin NV. CASP9 assessment of free modeling target predictions. Proteins 2011; 79 ( Suppl 10 ): 59 – 73.	en_US
dc.identifier.citedreference	de Juan D, Pazos F, Valencia A. Emerging methods in protein co‐evolution. Nat Rev Genet 2013; 14: 249 – 261.	en_US
dc.identifier.citedreference	Berenger F, Zhou Y, Shrestha R, Zhang KY. Entropy‐accelerated exact clustering of protein decoys. Bioinformatics 2011; 27: 939 – 945.	en_US
dc.identifier.citedreference	Berenger F, Shrestha R, Zhou Y, Simoncini D, Zhang KY. Durandal: fast exact clustering of protein decoys. J Comput Chem 2012; 33: 471 – 474.	en_US
dc.identifier.citedreference	Kajan L, Hopf TA, Kalas M, Marks DS, Rost B. FreeContact: fast and free software for protein contact prediction from residue co‐evolution. BMC Bioinformatics 2014; 15: 85.	en_US
dc.identifier.citedreference	Chiu DK, Kolodziejczak T. Inferring consensus structure from nucleic acid sequences. Comput Appl Biosci 1991; 7: 347 – 352.	en_US
dc.identifier.citedreference	Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 2008; 24: 333 – 340.	en_US
dc.identifier.citedreference	Feizi S, Marbach D, Medard M, Kellis M. Network deconvolution as a general method to distinguish direct dependencies in networks. Nat Biotechnol 2013; 31: 726 – 733.	en_US
dc.identifier.citedreference	Jones DT, Buchan DW, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 2012; 28: 184 – 190.	en_US
dc.identifier.citedreference	Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M. Direct‐coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci USA 2011; 108: E1293 – E1301.	en_US
dc.identifier.citedreference	Baldassi C, Zamparo M, Feinauer C, Procaccini A, Zecchina R, Weigt M, Pagnani A. Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein‐interaction partners. PloS One 2014; 9: e92721.	en_US
dc.identifier.citedreference	Ezkurdia I, Grana O, Izarzugaza JM, Tress ML. Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8. Proteins 2009; 77 ( S9 ): 196 – 209.	en_US
dc.identifier.citedreference	Wigner EP. Random matrices in physics. SIAM Rev 1967; 9: 1 – 23.	en_US
dc.identifier.citedreference	Monastyrskyy B, D'Andrea D, Fidelis K, Tramontano A, Kryshtafovych A. Evaluation of residue‐residue contact prediction in CASP10. Proteins 2014; 82 ( Suppl 2 ): 138 – 153.	en_US
dc.identifier.citedreference	Karthikraja V, Suresh A, Lulu S, Kangueane U, Kangueane P. Types of interfaces for homodimer folding and binding. Bioinformation 2009; 4: 101.	en_US
dc.identifier.citedreference	Tai CH, Bai H, Taylor TJ, Lee B. Assessment of template‐free modeling in CASP10 and ROLL. Proteins 2013; 82 ( Suppl 2 ): 57 – 83.	en_US
dc.identifier.citedreference	Xu D, Zhang Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field. Proteins 2012; 80: 1715 – 1735.	en_US
dc.identifier.citedreference	, Zhang Y. ITASSER server for protein 3D structure prediction. BMC Bioinformatics 2008; 9: 40.	en_US
dc.identifier.citedreference	Roy A, Kucukural A, Zhang Y. ITASSER: a unified platform for automated protein structure and function prediction. Nat Protocols 2010; 5: 725 – 738.	en_US
dc.identifier.citedreference	Roy A, Yang J, Zhang Y. COFACTOR: an accurate comparative algorithm for structure‐based protein function annotation. Nucleic Acids Res 2012; 40 (Web Server issue): W471 – W477.	en_US
dc.identifier.citedreference	Zhang J, Wang Q, Barz B, He Z, Kosztin I, Shang Y, Xu D. MUFOLD: a new solution for protein 3D structure prediction. Proteins 2010; 78: 1137 – 1152.	en_US
dc.identifier.citedreference	Cheng J, Baldi P. Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics 2007; 8: 113.	en_US
dc.identifier.citedreference	Tegge AN, Wang Z, Eickholt J, Cheng J. NNcon: improved protein contact map prediction using 2D‐recursive neural networks. Nucleic Acids Res 2009; 37 (Web Server issue): W515 – W518.	en_US
dc.identifier.citedreference	Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Stolovitzky G. Wisdom of crowds for robust gene network inference. Nat Methods 2012; 9: 796 – 804.	en_US
dc.identifier.citedreference	Newman ME. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys Rev E 2001; 64: 016132.	en_US
dc.identifier.citedreference	Di Lena P, Fariselli P, Margara L, Vassura M, Casadio R. Fast overlapping of protein contact maps by alignment of eigenvectors. Bioinformatics 2010; 26: 2250 – 2258.	en_US
dc.identifier.citedreference	Yang J, Jang R, Zhang Y, Shen HB. High‐accuracy prediction of transmembrane inter‐helix contacts and application to GPCR 3D structure modeling. Bioinformatics 2013; 29: 2579 – 2587.	en_US
dc.identifier.citedreference	Wu S, Zhang Y. A comprehensive assessment of sequence‐based and template‐based methods for protein contact prediction. Bioinformatics 2008; 24: 924 – 931.	en_US
dc.identifier.citedreference	Vassura M, Margara L, Di Lena P, Medri F, Fariselli P, Casadio R. Reconstruction of 3D structures from protein contact maps. IEEE/ACM Trans Comput Biol Bioinform 2008; 5: 357 – 367.	en_US
dc.identifier.citedreference	Nugent T, Jones DT. Predicting transmembrane helix packing arrangements using residue contacts and a force‐directed algorithm. PLoS Comput Biol 2010; 6: e1000714.	en_US
dc.identifier.citedreference	Taylor WR, Jones DT, Sadowski MI. Protein topology from predicted residue contacts. Protein Sci 2012; 21: 299 – 305.	en_US
dc.identifier.citedreference	Gromiha MM, Selvaraj S. Inter‐residue interactions in protein folding and stability. Prog Biophys Mol Biol 2004; 86: 235 – 277.	en_US
dc.identifier.citedreference	Schlessinger A, Punta M, Rost B. Natively unstructured regions in proteins identified from contact predictions. Bioinformatics 2007; 23: 2376 – 2384.	en_US
dc.identifier.citedreference	Izarzugaza JM, Vazquez M, del Pozo A, Valencia A. wKinMut: an integrated tool for the analysis and interpretation of mutations in human protein kinases. BMC Bioinformatics 2013; 14: 345.	en_US
dc.identifier.citedreference	Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins 1994; 18: 309 – 317.	en_US
dc.identifier.citedreference	Olmea O, Valencia A. Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold Des 1997; 2: S25 – S32.	en_US
dc.owningcollname	Interdisciplinary and Peer-Reviewed

Files in this item

Name:: prot24744.pdf
Size:: 697.0KB
Format:: PDF

View/Open

Name:: prot24744-sup-0001-suppinfo.pdf
Size:: 123.2KB
Format:: PDF

View/Open

Interdisciplinary and Peer-Reviewed

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.