Overcoming sequence misalignments with weighted structural superposition
dc.contributor.author | Khazanov, Nickolay A. | en_US |
dc.contributor.author | Damm‐ganamet, Kelly L. | en_US |
dc.contributor.author | Quang, Daniel X. | en_US |
dc.contributor.author | Carlson, Heather A. | en_US |
dc.date.accessioned | 2012-11-07T17:04:39Z | |
dc.date.available | 2014-01-07T14:51:08Z | en_US |
dc.date.issued | 2012-11 | en_US |
dc.identifier.citation | Khazanov, Nickolay A.; Damm‐ganamet, Kelly L. ; Quang, Daniel X.; Carlson, Heather A. (2012). "Overcoming sequence misalignments with weighted structural superposition." Proteins: Structure, Function, and Bioinformatics 80(11): 2523-2535. <http://hdl.handle.net/2027.42/94274> | en_US |
dc.identifier.issn | 0887-3585 | en_US |
dc.identifier.issn | 1097-0134 | en_US |
dc.identifier.uri | https://hdl.handle.net/2027.42/94274 | |
dc.description.abstract | An appropriate structural superposition identifies similarities and differences between homologous proteins that are not evident from sequence alignments alone. We have coupled our Gaussian‐weighted RMSD (wRMSD) tool with a sequence aligner and seed extension (SE) algorithm to create a robust technique for overlaying structures and aligning sequences of homologous proteins (HwRMSD). HwRMSD overcomes errors in the initial sequence alignment that would normally propagate into a standard RMSD overlay. SE can generate a corrected sequence alignment from the improved structural superposition obtained by wRMSD. HwRMSD's robust performance and its superiority over standard RMSD are demonstrated over a range of homologous proteins. Its better overlay results in corrected sequence alignments with good agreement to HOMSTRAD. Finally, HwRMSD is compared to established structural alignment methods: FATCAT, secondary‐structure matching, combinatorial extension, and Dalilite. Most methods are comparable at placing residue pairs within 2 Å, but HwRMSD places many more residue pairs within 1 Å, providing a clear advantage. Such high accuracy is essential in drug design, where small distances can have a large impact on computational predictions. This level of accuracy is also needed to correct sequence alignments in an automated fashion, especially for omics‐scale analysis. HwRMSD can align homologs with low‐sequence identity and large conformational differences, cases where both sequence‐based and structural‐based methods may fail. The HwRMSD pipeline overcomes the dependency of structural overlays on initial sequence pairing and removes the need to determine the best sequence‐alignment method, substitution matrix, and gap parameters for each unique pair of homologs. Proteins 2012. © 2012 Wiley Periodicals, Inc. | en_US |
dc.publisher | Wiley Subscription Services, Inc., A Wiley Company | en_US |
dc.subject.other | Protein Flexibility | en_US |
dc.subject.other | Structure Overlay | en_US |
dc.subject.other | RMSD | en_US |
dc.subject.other | Structure Alignment | en_US |
dc.subject.other | Homolog | en_US |
dc.subject.other | Sequence Alignment | en_US |
dc.title | Overcoming sequence misalignments with weighted structural superposition | en_US |
dc.type | Article | en_US |
dc.rights.robots | IndexNoFollow | en_US |
dc.subject.hlbsecondlevel | Biological Chemistry | en_US |
dc.subject.hlbtoplevel | Science | en_US |
dc.description.peerreviewed | Peer Reviewed | en_US |
dc.contributor.affiliationum | Department of Medicinal Chemistry, 428 Church Street, University of Michigan, Ann Arbor, MI 48109‐1065 | en_US |
dc.contributor.affiliationum | Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109‐2218 | en_US |
dc.contributor.affiliationum | Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109‐1065 | en_US |
dc.contributor.affiliationother | The Cooper Union, New York, New York 10003 | en_US |
dc.identifier.pmid | 22733542 | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/94274/1/PROT_24134_sm_SuppInfo2.pdf | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/94274/2/PROT_24134_sm_SuppInfo1.pdf | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/94274/3/24134_ftp.pdf | |
dc.identifier.doi | 10.1002/prot.24134 | en_US |
dc.identifier.source | Proteins: Structure, Function, and Bioinformatics | en_US |
dc.identifier.citedreference | Delano W. The PyMOL molecular graphics system, San Carlos, CA: DeLano Scientific; 2002. | en_US |
dc.identifier.citedreference | Roland L D, Jr. Sequence comparison and protein structure prediction. Curr Opin Struct Biol 2006; 16: 374 – 384. | en_US |
dc.identifier.citedreference | Sam V, Tai C, Garnier J, Gibrat J, Lee B, Munson P. Towards an automatic classification of protein structural domains based on structural similarity. Biomed Chromatogr Bioinformatics 2008; 9: 74. | en_US |
dc.identifier.citedreference | Damm KL, Carlson HA. Gaussian‐weighted RMSD superposition of proteins: a structural comparison for flexible proteins and predicted protein structures. Biophys J 2006; 90: 4558 – 4573. | en_US |
dc.identifier.citedreference | Kabsch W. A solution for the best rotation to relate two sets of vectors. Acta Crystallogr A 1976; 32: 922 – 923. | en_US |
dc.identifier.citedreference | Rice P, Longden I, Bleasby A. EMBOSS: The European molecular biology open software suite. Trends Genet 2000; 16: 276 – 277. | en_US |
dc.identifier.citedreference | Tai C‐H, Vincent J, Kim C, Lee B. SE: an algorithm for deriving sequence alignment from a pair of superimposed structures. Biomed Chromatogr Bioinformatics 2009; 10: S4. | en_US |
dc.identifier.citedreference | Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res 2000; 28: 235 – 242. | en_US |
dc.identifier.citedreference | EMBL‐EBI. Help‐About Scoring Matrices, http://www.ebi.ac.uk/help/matrix.html; 2010. | en_US |
dc.identifier.citedreference | Enroth C, Neujahr H, Schneider G, Lindqvist Y. The crystal structure of phenol hydroxylase in complex with FAD and phenol provides evidence for a concerted conformational change in the enzyme and its cofactor during catalysis. Structure 1998; 6: 605 – 617. | en_US |
dc.identifier.citedreference | Mesecar AD, Koshland DE. Sites of binding and orientation in a four‐location model for protein stereospecificity. IUBMB Life 2000; 49: 457 – 466. | en_US |
dc.identifier.citedreference | Ye Y, Godzik A. Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 2003; 19: ii246 – ii255. | en_US |
dc.identifier.citedreference | Theobald DL, Wuttke DS. THESEUS: maximum likelihood superpositioning and analysis of macromolecular structures. Bioinformatics 2006; 22: 2171 – 2172. | en_US |
dc.identifier.citedreference | Xu Z, Horwich AL, Sigler PB. The crystal structure of the asymmetric GroEL‐GroES‐(ADP)7 chaperonin complex. Nature 1997; 388: 741 – 750. | en_US |
dc.identifier.citedreference | Braig K, Adams PD, Brünger AT. Conformational variability in the refined structure of the chaperonin GroEL at 2.8 A resolution. Nat Struct Mol Biol 1995; 2: 1083 – 1094. | en_US |
dc.identifier.citedreference | Ditzel L, Löwe J, Stock D, Stetter K‐O, Huber H, Huber R, Steinbacher S. Crystal structure of the thermosome, the archaeal chaperonin and homolog of CCT. Cell 1998; 93: 125 – 138. | en_US |
dc.identifier.citedreference | Michel G, Sauve V, Larocque R, Li Y, Matte A, Cygler M. The structure of the RlmB 23S rRNA methyltransferase reveals a new methyltransferase fold with a unique knot. Structure 2002; 10: 1303 – 1315. | en_US |
dc.identifier.citedreference | Nureki O, Shirouzu M, Hashimoto K, Ishitani R, Terada T, Tamakoshi M, Oshima T, Chijimatsu M, Takio K, Vassylyev DG, Shibata T, Inoue Y, Kuramitsu S, Yokoyama S. An enzyme with a deep trefoil knot for the active‐site architecture. Acta Crystallogr D Biol Crystallogr 2002; 58: 1129 – 1137. | en_US |
dc.identifier.citedreference | Biopython, version 1.42, http://biopython.org; 2006. | en_US |
dc.identifier.citedreference | Price SR, Evans PR, Nagai K. Crystal structure of the spliceosomal U2B”‐U2A′ protein complex bound to a fragment of U2 small nuclear RNA. Nature 1998; 394: 645 – 650. | en_US |
dc.identifier.citedreference | Marino M, Braun L, Cossart P, Ghosh P. Structure of the lnlB leucine‐rich repeats, a domain that triggers host cell invasion by the bacterial pathogen L. monocytogenes. Mol Cell 1999; 4: 1063 – 1072. | en_US |
dc.identifier.citedreference | Owen DJ, Vallis Y, Pearse BM, McMahon HT, Evans PR. The structure and function of the beta 2‐adaptin appendage domain. EMBO J 2000; 19: 4216 – 4227. | en_US |
dc.identifier.citedreference | Traub LM, Downs MA, Westrich JL, Fremont DH. Crystal structure of the alpha appendage of AP‐2 reveals a recruitment platform for clathrin‐coat assembly. Proc Natl Acad Sci USA 1999; 96: 8907 – 8912. | en_US |
dc.identifier.citedreference | Rost B. Twilight zone of protein sequence alignments. Protein Eng 1999; 12: 85 – 94. | en_US |
dc.identifier.citedreference | Elofsson A. A study on protein sequence alignment quality. Proteins Struct Funct Bioinformatics 2002; 46: 330 – 339. | en_US |
dc.identifier.citedreference | Benson ML, Smith RD, Khazanov NA, Dimcheff B, Beaver J, Dresslar P, Nerothin J, Carlson HA. Binding MOAD, a high‐quality protein–ligand database. Nucleic Acids Res 2008; 36: D674 – D678. | en_US |
dc.identifier.citedreference | Gong W, O'Gara M, Blumenthal RM, Cheng X. Structure of pvu II DNA‐(cytosine N4) methyltransferase, an example of domain permutation and protein fold assignment. Nucleic Acids Res 1997; 25: 2702 – 2715. | en_US |
dc.identifier.citedreference | Holm L, Sander C. Mapping the protein universe. Science 1996; 273: 595 – 602. | en_US |
dc.identifier.citedreference | Watson JD, Laskowski RA, Thornton JM. Predicting protein function from sequence and structural data. Curr Opin Struct Biol 2005; 15: 275 – 284. | en_US |
dc.identifier.citedreference | Marsden RL, Ranea JA, Sillero A, Redfern O, Yeats C, Maibaum M, Lee D, Addou S, Reeves GA, Dallman TJ, Orengo CA. Exploiting protein structure data to explore the evolution of protein function and biological complexity. Philos Trans R Soc Lond B Biol Sci 2006; 361: 425 – 440. | en_US |
dc.identifier.citedreference | Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 2008; 36: D419 – D425. | en_US |
dc.identifier.citedreference | Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res 2007; 35: D291 – D297. | en_US |
dc.identifier.citedreference | Holm L, Rosenstrom P. Dali server: conservation mapping in 3D. Nucleic Acids Res 2010; 38: W545 – W549. | en_US |
dc.identifier.citedreference | Bhaduri A, Pugalenthi G, Sowdhamini R. PASS2: an automated database of protein alignments organised as structural superfamilies. BMC Bioinformatics 2004; 5: 35. | en_US |
dc.identifier.citedreference | Wang Y, Addess KJ, Chen J, Geer LY, He J, He S, Lu S, Madej T, Marchler‐Bauer A, Thiessen PA, Zhang N, Bryant SH. MMDB: annotating protein sequences with Entrez's 3D‐structure database. Nucleic Acids Research 2007; 35: D298 – D300. | en_US |
dc.identifier.citedreference | Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. The ASTRAL Compendium in 2004. Nucleic Acids Research 2004; 32: D189 – D192. | en_US |
dc.identifier.citedreference | Mizuguchi K, Deane CM, Blundell TL, Overington JP. HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci 1998; 7: 2469 – 2471. | en_US |
dc.identifier.citedreference | Schmidt R, Altman RB, Gerstein M. LPFC: an internet library of protein family core structures. Protein Sci 1997; 6: 246 – 248. | en_US |
dc.identifier.citedreference | Orengo CA, Thornton JM. Protein families and their evolution—a structural perspective. Annu Rev Biochem 2005; 74: 867 – 900. | en_US |
dc.identifier.citedreference | Valas R, Yang S, Bourne P. Nothing about protein structure classification makes sense except in the light of evolution. Curr Opini Struct Biol 2009; 19: 329 – 334. | en_US |
dc.identifier.citedreference | Kolodny R, Koehl P, Levitt M. Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol 2005; 346: 1173 – 1188. | en_US |
dc.identifier.citedreference | Taylor WR, Orengo CA. Protein structure alignment. J Mol Biol 1989; 208: 1 – 22. | en_US |
dc.identifier.citedreference | Gerstein M, Levitt M. Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins. Protein Sci 1998; 7: 445 – 456. | en_US |
dc.identifier.citedreference | Subbiah S, Laurents DV, Levitt M. Structural similarity of DNA‐binding domains of bacteriophage repressors and the globin core. Curr Biol 1993; 3: 141 – 148. | en_US |
dc.identifier.citedreference | Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol 1993; 233: 123 – 138. | en_US |
dc.identifier.citedreference | Kleywegt G. Use of Non‐crystallographic symmetry in protein structure refinement. Acta Crystallogr D 1996; 52: 842 – 857. | en_US |
dc.identifier.citedreference | Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 1998; 11: 739 – 747. | en_US |
dc.identifier.citedreference | Krissinel E, Henrick K. Secondary‐structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D 2004; 60: 2256 – 2268. | en_US |
dc.identifier.citedreference | Mayr G, Domingues F, Lackner P. Comparative analysis of protein structure alignments. Biomed Chromatogr Struct Biol 2007; 7: 50. | en_US |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.