LabCaS: Labeling calpain substrate cleavage sites from amino acid sequence using conditional random fields
dc.contributor.author | Fan, Yong‐xian | en_US |
dc.contributor.author | Zhang, Yang | en_US |
dc.contributor.author | Shen, Hong‐bin | en_US |
dc.date.accessioned | 2013-04-08T20:49:29Z | |
dc.date.available | 2014-05-23T15:04:19Z | en_US |
dc.date.issued | 2013-04 | en_US |
dc.identifier.citation | Fan, Yong‐xian ; Zhang, Yang; Shen, Hong‐bin (2013). "LabCaS: Labeling calpain substrate cleavage sites from amino acid sequence using conditional random fields." Proteins: Structure, Function, and Bioinformatics 81(4): 622-634. <http://hdl.handle.net/2027.42/97155> | en_US |
dc.identifier.issn | 0887-3585 | en_US |
dc.identifier.issn | 1097-0134 | en_US |
dc.identifier.uri | https://hdl.handle.net/2027.42/97155 | |
dc.description.abstract | The calpain family of Ca 2+ ‐dependent cysteine proteases plays a vital role in many important biological processes which is closely related with a variety of pathological states. Activated calpains selectively cleave relevant substrates at specific cleavage sites, yielding multiple fragments that can have different functions from the intact substrate protein. Until now, our knowledge about the calpain functions and their substrate cleavage mechanisms are limited because the experimental determination and validation on calpain binding are usually laborious and expensive. In this work, we aim to develop a new computational approach (LabCaS) for accurate prediction of the calpain substrate cleavage sites from amino acid sequences. To overcome the imbalance of negative and positive samples in the machine‐learning training which have been suffered by most of the former approaches when splitting sequences into short peptides, we designed a conditional random field algorithm that can label the potential cleavage sites directly from the entire sequences. By integrating the multiple amino acid features and those derived from sequences, LabCaS achieves an accurate recognition of the cleave sites for most calpain proteins. In a jackknife test on a set of 129 benchmark proteins, LabCaS generates an AUC score 0.862. The LabCaS program is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/LabCaS . Proteins 2013. © 2012 Wiley Periodicals, Inc. | en_US |
dc.publisher | Wiley Subscription Services, Inc., A Wiley Company | en_US |
dc.subject.other | Ensemble Learning | en_US |
dc.subject.other | Protease Substrate Specificity | en_US |
dc.subject.other | Cleavage Site Prediction | en_US |
dc.subject.other | Sequence Labeling | en_US |
dc.title | LabCaS: Labeling calpain substrate cleavage sites from amino acid sequence using conditional random fields | en_US |
dc.type | Article | en_US |
dc.rights.robots | IndexNoFollow | en_US |
dc.subject.hlbsecondlevel | Biological Chemistry | en_US |
dc.subject.hlbtoplevel | Science | en_US |
dc.description.peerreviewed | Peer Reviewed | en_US |
dc.contributor.affiliationum | Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, Michigan 48109 | en_US |
dc.contributor.affiliationum | Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109 | en_US |
dc.contributor.affiliationum | Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, Michigan 48109 | en_US |
dc.contributor.affiliationother | Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China | en_US |
dc.contributor.affiliationother | Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China | en_US |
dc.identifier.pmid | 23180633 | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/97155/1/24217_ftp.pdf | |
dc.identifier.doi | 10.1002/prot.24217 | en_US |
dc.identifier.source | Proteins: Structure, Function, and Bioinformatics | en_US |
dc.identifier.citedreference | Hammersley J, Clifford P. Markov field on finite graphs and lattices, 1971. Available at: http://www.statslab.cam.ac.uk/∼grg/books/hammfest/ hamm‐cliff.pdf. Accessed on 1 June 2012. | en_US |
dc.identifier.citedreference | Mahrus S, Trinidad JC, Barkan DT, Sali A, Burlingame AL, Wells JA. Global sequencing of proteolytic cleavage sites in apoptosis by specific labeling of protein N termini. Cell 2008; 134: 866 – 876. | en_US |
dc.identifier.citedreference | Roy A, Kucukural A, Zhang Y. I‐TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 2010; 5: 725 – 738. | en_US |
dc.identifier.citedreference | Xu D, Zhang J, Roy A, Zhang Y. Automated protein structure modeling in CASP9 by I‐TASSER pipeline combined with QUARK‐based ab initio folding and FG‐MD‐based structure refinement. Proteins 2011; 79 ( Suppl 10 ): 147 – 160. | en_US |
dc.identifier.citedreference | Wu S, Skolnick J, Zhang Y. Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol 2007; 5: 17. | en_US |
dc.identifier.citedreference | Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen‐bonded and geometrical features. Biopolymers 1983; 22: 2577 – 2637. | en_US |
dc.identifier.citedreference | Jones DT. Protein secondary structure prediction based on position‐specific scoring matrices. J Mol Biol 1999; 292: 195 – 202. | en_US |
dc.identifier.citedreference | Granseth E, von Heijne G, Elofsson A. A study of the membrane‐water interface region of membrane proteins. J Mol Biol 2005; 346: 377 – 385. | en_US |
dc.identifier.citedreference | Mak MW, Wang W, Kung SY. Fusion of Conditional Random Field and SignalP for Protein Cleavage Site Prediction. Annual Summit and Conference, Sapporo; 2009. p 716 – 721. | en_US |
dc.identifier.citedreference | Fan YX, Song J, Shen HB, Kong X. PredCSF: an integrated feature‐based approach for predicting conotoxin superfamily. Protein Pept Lett 2011; 18: 261 – 267. | en_US |
dc.identifier.citedreference | Savojardo C, Fariselli P, Alhamdoosh M, Martelli PL, Pierleoni A, Casadio R. Improving the prediction of disulfide bonds in Eukaryotes with machine learning methods and protein subcellular localization. Bioinformatics 2011; 27: 2224 – 2230. | en_US |
dc.identifier.citedreference | Sutton C, McCallum A. An introduction to conditional random fields for relational learning. Introduction to statistical relational learning; 2006. p 93 – 128. | en_US |
dc.identifier.citedreference | Zhou ZH, Yu Y. Ensembling local learners through multimodal perturbation. IEEE Trans Syst Man Cybern B Cybern 2005; 35: 725 – 735. | en_US |
dc.identifier.citedreference | Xu L, Amari S. Combining classifiers and learning mixture‐of‐experts. Encyclopedia Artif Intell 2009: 318 – 326. | en_US |
dc.identifier.citedreference | Li W, Godzik A. Cd‐hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006; 22: 1658 – 1659. | en_US |
dc.identifier.citedreference | Johnson GV, Jope RS, Binder LI. Proteolysis of tau by calpain. Biochem Biophys Res Commun 1989; 163: 1505 – 1511. | en_US |
dc.identifier.citedreference | Liu MC, Kobeissy F, Zheng W, Zhang Z, Hayes RL, Wang KK. Dual vulnerability of tau to calpains and caspase‐3 proteolysis under neurotoxic and neurodegenerative conditions. ASN Neuro 2011; 3: e00051. | en_US |
dc.identifier.citedreference | Zhang Y. I‐TASSER: fully automated protein structure prediction in CASP8. Proteins 2009; 77 Suppl 9: 100 – 113. | en_US |
dc.identifier.citedreference | Xu J, Zhang Y. How significant is a protein structure similarity with TM‐score= 0.5? Bioinformatics 2010; 26: 889 – 895. | en_US |
dc.identifier.citedreference | Orrenius S, Zhivotovsky B, Nicotera P. Regulation of cell death: the calcium‐apoptosis link. Nat Rev Mol Cell Biol 2003; 4: 552 – 565. | en_US |
dc.identifier.citedreference | Arnandis T, Ferrer‐Vicens I, Garcia‐Trevijano ER, Miralles VJ, Garcia C, Torres L, Vina JR, Zaragoza R. Calpains mediate epithelial‐cell death during mammary gland involution: mitochondria and lysosomal destabilization. Cell Death Differ 2012; 19: 1536 – 48. | en_US |
dc.identifier.citedreference | Kreuzaler PA, Staniszewska AD, Li W, Omidvar N, Kedjouar B, Turkson J, Poli V, Flavell RA, Clarkson RW, Watson CJ. Stat3 controls lysosomal‐mediated cell death in vivo. Nat Cell Biol 2011; 13: 303 – 309. | en_US |
dc.identifier.citedreference | Perrin BJ, Amann KJ, Huttenlocher A. Proteolysis of cortactin by calpain regulates membrane protrusion during cell migration. Mol Biol Cell 2006; 17: 239 – 250. | en_US |
dc.identifier.citedreference | Suzuki K, Hata S, Kawabata Y, Sorimachi H. Structure, activation, and biology of calpain. Diabetes 2004; 53 ( Suppl 1 ): S12 – S18. | en_US |
dc.identifier.citedreference | Gil‐Parrado S, Popp O, Knoch TA, Zahler S, Bestvater F, Felgentrager M, Holloschi A, Fernandez‐Montalvan A, Auerswald EA, Fritz H, Fuentes‐Prior P, Machleidt W, Spiess E. Subcellular localization and in vivo subunit interactions of ubiquitous mu‐calpain. J Biol Chem 2003; 278: 16336 – 16346. | en_US |
dc.identifier.citedreference | Storr SJ, Carragher NO, Frame MC, Parr T, Martin SG. The calpain system and cancer. Nat Rev Cancer 2011; 11: 364 – 374. | en_US |
dc.identifier.citedreference | Bertipaglia I, Carafoli E. Calpains and human disease. Subcell Biochem 2007; 45: 29 – 53. | en_US |
dc.identifier.citedreference | Franco SJ, Huttenlocher A. Regulating cell migration: calpains make the cut. J Cell Sci 2005; 118: 3829 – 3838. | en_US |
dc.identifier.citedreference | Croall DE, Ersfeld K. The calpains: modular designs and functional diversity. Genome Biol 2007; 8: 218. | en_US |
dc.identifier.citedreference | Zatz M, Starling A. Calpains and disease. N Engl J Med 2005; 352: 2413 – 2423. | en_US |
dc.identifier.citedreference | Ono Y, Shimada H, Sorimachi H, Richard I, Saido TC, Beckmann JS, Ishiura S, Suzuki K. Functional defects of a muscle‐specific calpain, p94, caused by mutations associated with limb‐girdle muscular dystrophy type 2A. J Biol Chem 1998; 273: 17073. | en_US |
dc.identifier.citedreference | Horikawa Y. Genetic variation in the gene encoding calpain‐10 is associated with type 2 diabetes mellitus. Nat Genet 2000; 26: 502 – 502. | en_US |
dc.identifier.citedreference | Friedrich P, Bozoky Z. Digestive versus regulatory proteases: on calpain action in vivo. Biol Chem 2005; 386: 609 – 612. | en_US |
dc.identifier.citedreference | Cuerrier D, Moldoveanu T, Davies PL. Determination of peptide substrate specificity for mu‐calpain by a peptide library‐based approach: the importance of primed side interactions. J Biol Chem 2005; 280: 40632 – 40641. | en_US |
dc.identifier.citedreference | Tompa P, Buzder‐Lantos P, Tantos A, Farkas A, Szilagyi A, Banoczi Z, Hudecz F, Friedrich P. On the sequential determinants of calpain cleavage. J Biol Chem 2004; 279: 20775 – 20785. | en_US |
dc.identifier.citedreference | Banik NL, Chou CH, Deibler GE, Krutzch HC, Hogan EL. Peptide bond specificity of calpain: proteolysis of human myelin basic protein. J Neurosci Res 1994; 37: 489 – 496. | en_US |
dc.identifier.citedreference | Boyd SE, Pike RN, Rudy GB, Whisstock JC. Garcia de la Banda M. PoPS: a computational tool for modeling and predicting protease specificity. J Bioinform Comput Biol 2005; 3: 551 – 585. | en_US |
dc.identifier.citedreference | Verspurten J, Gevaert K, Declercq W, Vandenabeele P. SitePredicting the cleavage of proteinase substrates. Trends Biochem Sci 2009; 34: 319 – 323. | en_US |
dc.identifier.citedreference | duVerle D, Takigawa I, Ono Y, Sorimachi H, Mamitsuka H. CaMPDB: a resource for calpain and modulatory proteolysis. Genome Inform 2010; 22: 202 – 213. | en_US |
dc.identifier.citedreference | duVerle D, Ono Y, Sorimachi H, Mamitsuka H. Calpain Cleavage Prediction Using Multiple Kernel Learning. PloS One 2011; 6: e19035. | en_US |
dc.identifier.citedreference | Liu Z, Cao J, Gao X, Ma Q, Ren J, Xue Y. GPS‐CCD: a novel computational program for the prediction of calpain cleavage sites. Plos One 2011; 6: e19001. | en_US |
dc.identifier.citedreference | Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on Machine Learning, Williamstown; 2001. p 282 – 289. | en_US |
dc.identifier.citedreference | Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 1990; 18: 6097 – 6100. | en_US |
dc.identifier.citedreference | Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res 2004; 14: 1188 – 1190. | en_US |
dc.identifier.citedreference | Song J, Tan H, Shen H, Mahmood K, Boyd SE, Webb GI, Akutsu T, Whisstock JC. Cascleave: towards more accurate prediction of caspase substrate cleavage sites. Bioinformatics 2010; 26: 752 – 760. | en_US |
dc.identifier.citedreference | Cheng J, Randall AZ, Sweredoski MJ, Baldi P. SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res 2005; 33 ( Web Server issue ): W72 – W76. | en_US |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.