Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
dc.contributor.author | Minovitsky, Simon | |
dc.contributor.author | Stegmaier, Philip | |
dc.contributor.author | Kel, Alexander | |
dc.contributor.author | Kondrashov, Alexey S | |
dc.contributor.author | Dubchak, Inna | |
dc.date.accessioned | 2015-08-07T17:25:59Z | |
dc.date.available | 2015-08-07T17:25:59Z | |
dc.date.issued | 2007-10-18 | |
dc.identifier.citation | BMC Genomics. 2007 Oct 18;8(1):378 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/112343 | en_US |
dc.description.abstract | Abstract Background A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, ~5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure. Results We investigated relative abundances of short sequence motifs in all human CNSs present in the human/mouse whole-genome alignments vs. three background sets of sequences: (i) weakly conserved or unconserved non-coding sequences (non-CNSs); (ii) near-promoter sequences (located between nucleotides -500 and -1500, relative to a start of transcription); and (iii) random sequences with the same nucleotide composition as that of CNSs. When compared to non-CNSs and near-promoter sequences, CNSs possess an excess of AT-rich motifs, often containing runs of identical nucleotides. In contrast, when compared to random sequences, CNSs contain an excess of GC-rich motifs which, however, lack CpG dinucleotides. Thus, abundance of short sequence motifs in human CNSs, taken as a whole, is mostly determined by their overall compositional properties and not by overrepresentation of any specific short motifs. These properties are: (i) high AT-content of CNSs, (ii) a tendency, probably due to context-dependent mutation, of A's and T's to clump, (iii) presence of short GC-rich regions, and (iv) avoidance of CpG contexts, due to their hypermutability. Only a small number of short motifs, overrepresented in all human CNSs are similar to binding sites of transcription factors from the FOX family. Conclusion Human CNSs as a whole appear to be too broad a class of sequences to possess strong footprints of any short sequence-specific functions. Such footprints should be studied at the level of functional subclasses of CNSs, such as those which flank genes with a particular pattern of expression. Overall properties of CNSs are affected by patterns in mutation, suggesting that selection which causes their conservation is not always very strong. | |
dc.title | Short sequence motifs, overrepresented in mammalian conserved non-coding sequences | |
dc.type | Article | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/112343/1/12864_2007_Article_1091.pdf | |
dc.identifier.doi | 10.1186/1471-2164-8-378 | en_US |
dc.language.rfc3066 | en | |
dc.rights.holder | Minovitsky et al. | |
dc.date.updated | 2015-08-07T17:25:59Z | |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.