Show simple item record

LabCaS: Labeling calpain substrate cleavage sites from amino acid sequence using conditional random fields

dc.contributor.authorFan, Yong‐xianen_US
dc.contributor.authorZhang, Yangen_US
dc.contributor.authorShen, Hong‐binen_US
dc.date.accessioned2013-04-08T20:49:29Z
dc.date.available2014-05-23T15:04:19Zen_US
dc.date.issued2013-04en_US
dc.identifier.citationFan, Yong‐xian ; Zhang, Yang; Shen, Hong‐bin (2013). "LabCaS: Labeling calpain substrate cleavage sites from amino acid sequence using conditional random fields." Proteins: Structure, Function, and Bioinformatics 81(4): 622-634. <http://hdl.handle.net/2027.42/97155>en_US
dc.identifier.issn0887-3585en_US
dc.identifier.issn1097-0134en_US
dc.identifier.urihttps://hdl.handle.net/2027.42/97155
dc.description.abstractThe calpain family of Ca 2+ ‐dependent cysteine proteases plays a vital role in many important biological processes which is closely related with a variety of pathological states. Activated calpains selectively cleave relevant substrates at specific cleavage sites, yielding multiple fragments that can have different functions from the intact substrate protein. Until now, our knowledge about the calpain functions and their substrate cleavage mechanisms are limited because the experimental determination and validation on calpain binding are usually laborious and expensive. In this work, we aim to develop a new computational approach (LabCaS) for accurate prediction of the calpain substrate cleavage sites from amino acid sequences. To overcome the imbalance of negative and positive samples in the machine‐learning training which have been suffered by most of the former approaches when splitting sequences into short peptides, we designed a conditional random field algorithm that can label the potential cleavage sites directly from the entire sequences. By integrating the multiple amino acid features and those derived from sequences, LabCaS achieves an accurate recognition of the cleave sites for most calpain proteins. In a jackknife test on a set of 129 benchmark proteins, LabCaS generates an AUC score 0.862. The LabCaS program is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/LabCaS . Proteins 2013. © 2012 Wiley Periodicals, Inc.en_US
dc.publisherWiley Subscription Services, Inc., A Wiley Companyen_US
dc.subject.otherEnsemble Learningen_US
dc.subject.otherProtease Substrate Specificityen_US
dc.subject.otherCleavage Site Predictionen_US
dc.subject.otherSequence Labelingen_US
dc.titleLabCaS: Labeling calpain substrate cleavage sites from amino acid sequence using conditional random fieldsen_US
dc.typeArticleen_US
dc.rights.robotsIndexNoFollowen_US
dc.subject.hlbsecondlevelBiological Chemistryen_US
dc.subject.hlbtoplevelScienceen_US
dc.description.peerreviewedPeer Revieweden_US
dc.contributor.affiliationumDepartment of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, Michigan 48109en_US
dc.contributor.affiliationumDepartment of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109en_US
dc.contributor.affiliationumDepartment of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, Michigan 48109en_US
dc.contributor.affiliationotherDepartment of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, Chinaen_US
dc.contributor.affiliationotherDepartment of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, Chinaen_US
dc.identifier.pmid23180633en_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/97155/1/24217_ftp.pdf
dc.identifier.doi10.1002/prot.24217en_US
dc.identifier.sourceProteins: Structure, Function, and Bioinformaticsen_US
dc.identifier.citedreferenceHammersley J, Clifford P. Markov field on finite graphs and lattices, 1971. Available at: http://www.statslab.cam.ac.uk/∼grg/books/hammfest/ hamm‐cliff.pdf. Accessed on 1 June 2012.en_US
dc.identifier.citedreferenceMahrus S, Trinidad JC, Barkan DT, Sali A, Burlingame AL, Wells JA. Global sequencing of proteolytic cleavage sites in apoptosis by specific labeling of protein N termini. Cell 2008; 134: 866 – 876.en_US
dc.identifier.citedreferenceRoy A, Kucukural A, Zhang Y. I‐TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 2010; 5: 725 – 738.en_US
dc.identifier.citedreferenceXu D, Zhang J, Roy A, Zhang Y. Automated protein structure modeling in CASP9 by I‐TASSER pipeline combined with QUARK‐based ab initio folding and FG‐MD‐based structure refinement. Proteins 2011; 79 ( Suppl 10 ): 147 – 160.en_US
dc.identifier.citedreferenceWu S, Skolnick J, Zhang Y. Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol 2007; 5: 17.en_US
dc.identifier.citedreferenceKabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen‐bonded and geometrical features. Biopolymers 1983; 22: 2577 – 2637.en_US
dc.identifier.citedreferenceJones DT. Protein secondary structure prediction based on position‐specific scoring matrices. J Mol Biol 1999; 292: 195 – 202.en_US
dc.identifier.citedreferenceGranseth E, von Heijne G, Elofsson A. A study of the membrane‐water interface region of membrane proteins. J Mol Biol 2005; 346: 377 – 385.en_US
dc.identifier.citedreferenceMak MW, Wang W, Kung SY. Fusion of Conditional Random Field and SignalP for Protein Cleavage Site Prediction. Annual Summit and Conference, Sapporo; 2009. p 716 – 721.en_US
dc.identifier.citedreferenceFan YX, Song J, Shen HB, Kong X. PredCSF: an integrated feature‐based approach for predicting conotoxin superfamily. Protein Pept Lett 2011; 18: 261 – 267.en_US
dc.identifier.citedreferenceSavojardo C, Fariselli P, Alhamdoosh M, Martelli PL, Pierleoni A, Casadio R. Improving the prediction of disulfide bonds in Eukaryotes with machine learning methods and protein subcellular localization. Bioinformatics 2011; 27: 2224 – 2230.en_US
dc.identifier.citedreferenceSutton C, McCallum A. An introduction to conditional random fields for relational learning. Introduction to statistical relational learning; 2006. p 93 – 128.en_US
dc.identifier.citedreferenceZhou ZH, Yu Y. Ensembling local learners through multimodal perturbation. IEEE Trans Syst Man Cybern B Cybern 2005; 35: 725 – 735.en_US
dc.identifier.citedreferenceXu L, Amari S. Combining classifiers and learning mixture‐of‐experts. Encyclopedia Artif Intell 2009: 318 – 326.en_US
dc.identifier.citedreferenceLi W, Godzik A. Cd‐hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006; 22: 1658 – 1659.en_US
dc.identifier.citedreferenceJohnson GV, Jope RS, Binder LI. Proteolysis of tau by calpain. Biochem Biophys Res Commun 1989; 163: 1505 – 1511.en_US
dc.identifier.citedreferenceLiu MC, Kobeissy F, Zheng W, Zhang Z, Hayes RL, Wang KK. Dual vulnerability of tau to calpains and caspase‐3 proteolysis under neurotoxic and neurodegenerative conditions. ASN Neuro 2011; 3: e00051.en_US
dc.identifier.citedreferenceZhang Y. I‐TASSER: fully automated protein structure prediction in CASP8. Proteins 2009; 77 Suppl 9: 100 – 113.en_US
dc.identifier.citedreferenceXu J, Zhang Y. How significant is a protein structure similarity with TM‐score= 0.5? Bioinformatics 2010; 26: 889 – 895.en_US
dc.identifier.citedreferenceOrrenius S, Zhivotovsky B, Nicotera P. Regulation of cell death: the calcium‐apoptosis link. Nat Rev Mol Cell Biol 2003; 4: 552 – 565.en_US
dc.identifier.citedreferenceArnandis T, Ferrer‐Vicens I, Garcia‐Trevijano ER, Miralles VJ, Garcia C, Torres L, Vina JR, Zaragoza R. Calpains mediate epithelial‐cell death during mammary gland involution: mitochondria and lysosomal destabilization. Cell Death Differ 2012; 19: 1536 – 48.en_US
dc.identifier.citedreferenceKreuzaler PA, Staniszewska AD, Li W, Omidvar N, Kedjouar B, Turkson J, Poli V, Flavell RA, Clarkson RW, Watson CJ. Stat3 controls lysosomal‐mediated cell death in vivo. Nat Cell Biol 2011; 13: 303 – 309.en_US
dc.identifier.citedreferencePerrin BJ, Amann KJ, Huttenlocher A. Proteolysis of cortactin by calpain regulates membrane protrusion during cell migration. Mol Biol Cell 2006; 17: 239 – 250.en_US
dc.identifier.citedreferenceSuzuki K, Hata S, Kawabata Y, Sorimachi H. Structure, activation, and biology of calpain. Diabetes 2004; 53 ( Suppl 1 ): S12 – S18.en_US
dc.identifier.citedreferenceGil‐Parrado S, Popp O, Knoch TA, Zahler S, Bestvater F, Felgentrager M, Holloschi A, Fernandez‐Montalvan A, Auerswald EA, Fritz H, Fuentes‐Prior P, Machleidt W, Spiess E. Subcellular localization and in vivo subunit interactions of ubiquitous mu‐calpain. J Biol Chem 2003; 278: 16336 – 16346.en_US
dc.identifier.citedreferenceStorr SJ, Carragher NO, Frame MC, Parr T, Martin SG. The calpain system and cancer. Nat Rev Cancer 2011; 11: 364 – 374.en_US
dc.identifier.citedreferenceBertipaglia I, Carafoli E. Calpains and human disease. Subcell Biochem 2007; 45: 29 – 53.en_US
dc.identifier.citedreferenceFranco SJ, Huttenlocher A. Regulating cell migration: calpains make the cut. J Cell Sci 2005; 118: 3829 – 3838.en_US
dc.identifier.citedreferenceCroall DE, Ersfeld K. The calpains: modular designs and functional diversity. Genome Biol 2007; 8: 218.en_US
dc.identifier.citedreferenceZatz M, Starling A. Calpains and disease. N Engl J Med 2005; 352: 2413 – 2423.en_US
dc.identifier.citedreferenceOno Y, Shimada H, Sorimachi H, Richard I, Saido TC, Beckmann JS, Ishiura S, Suzuki K. Functional defects of a muscle‐specific calpain, p94, caused by mutations associated with limb‐girdle muscular dystrophy type 2A. J Biol Chem 1998; 273: 17073.en_US
dc.identifier.citedreferenceHorikawa Y. Genetic variation in the gene encoding calpain‐10 is associated with type 2 diabetes mellitus. Nat Genet 2000; 26: 502 – 502.en_US
dc.identifier.citedreferenceFriedrich P, Bozoky Z. Digestive versus regulatory proteases: on calpain action in vivo. Biol Chem 2005; 386: 609 – 612.en_US
dc.identifier.citedreferenceCuerrier D, Moldoveanu T, Davies PL. Determination of peptide substrate specificity for mu‐calpain by a peptide library‐based approach: the importance of primed side interactions. J Biol Chem 2005; 280: 40632 – 40641.en_US
dc.identifier.citedreferenceTompa P, Buzder‐Lantos P, Tantos A, Farkas A, Szilagyi A, Banoczi Z, Hudecz F, Friedrich P. On the sequential determinants of calpain cleavage. J Biol Chem 2004; 279: 20775 – 20785.en_US
dc.identifier.citedreferenceBanik NL, Chou CH, Deibler GE, Krutzch HC, Hogan EL. Peptide bond specificity of calpain: proteolysis of human myelin basic protein. J Neurosci Res 1994; 37: 489 – 496.en_US
dc.identifier.citedreferenceBoyd SE, Pike RN, Rudy GB, Whisstock JC. Garcia de la Banda M. PoPS: a computational tool for modeling and predicting protease specificity. J Bioinform Comput Biol 2005; 3: 551 – 585.en_US
dc.identifier.citedreferenceVerspurten J, Gevaert K, Declercq W, Vandenabeele P. SitePredicting the cleavage of proteinase substrates. Trends Biochem Sci 2009; 34: 319 – 323.en_US
dc.identifier.citedreferenceduVerle D, Takigawa I, Ono Y, Sorimachi H, Mamitsuka H. CaMPDB: a resource for calpain and modulatory proteolysis. Genome Inform 2010; 22: 202 – 213.en_US
dc.identifier.citedreferenceduVerle D, Ono Y, Sorimachi H, Mamitsuka H. Calpain Cleavage Prediction Using Multiple Kernel Learning. PloS One 2011; 6: e19035.en_US
dc.identifier.citedreferenceLiu Z, Cao J, Gao X, Ma Q, Ren J, Xue Y. GPS‐CCD: a novel computational program for the prediction of calpain cleavage sites. Plos One 2011; 6: e19001.en_US
dc.identifier.citedreferenceLafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on Machine Learning, Williamstown; 2001. p 282 – 289.en_US
dc.identifier.citedreferenceSchneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 1990; 18: 6097 – 6100.en_US
dc.identifier.citedreferenceCrooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res 2004; 14: 1188 – 1190.en_US
dc.identifier.citedreferenceSong J, Tan H, Shen H, Mahmood K, Boyd SE, Webb GI, Akutsu T, Whisstock JC. Cascleave: towards more accurate prediction of caspase substrate cleavage sites. Bioinformatics 2010; 26: 752 – 760.en_US
dc.identifier.citedreferenceCheng J, Randall AZ, Sweredoski MJ, Baldi P. SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res 2005; 33 ( Web Server issue ): W72 – W76.en_US
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.