Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families
dc.contributor.author | Wang, Yan | |
dc.contributor.author | Shi, Qiang | |
dc.contributor.author | Yang, Pengshuo | |
dc.contributor.author | Zhang, Chengxin | |
dc.contributor.author | Mortuza, S. M | |
dc.contributor.author | Xue, Zhidong | |
dc.contributor.author | Ning, Kang | |
dc.contributor.author | Zhang, Yang | |
dc.date.accessioned | 2019-11-22T13:53:13Z | |
dc.date.available | 2019-11-22T13:53:13Z | |
dc.date.issued | 2019-11-01 | |
dc.identifier.citation | Genome Biology. 2019 Nov 01;20(1):229 | |
dc.identifier.uri | https://doi.org/10.1186/s13059-019-1823-z | |
dc.identifier.uri | https://hdl.handle.net/2027.42/152163 | |
dc.description.abstract | Abstract Introduction The ocean microbiome represents one of the largest microbiomes and produces nearly half of the primary energy on the planet through photosynthesis or chemosynthesis. Using recent advances in marine genomics, we explore new applications of oceanic metagenomes for protein structure and function prediction. Results By processing 1.3 TB of high-quality reads from the Tara Oceans data, we obtain 97 million non-redundant genes. Of the 5721 Pfam families that lack experimental structures, 2801 have at least one member associated with the oceanic metagenomics dataset. We apply C-QUARK, a deep-learning contact-guided ab initio structure prediction pipeline, to model 27 families, where 20 are predicted to have a reliable fold with estimated template modeling score (TM-score) at least 0.5. Detailed analyses reveal that the abundance of microbial genera in the ocean is highly correlated to the frequency of occurrence in the modeled Pfam families, suggesting the significant role of the Tara Oceans genomes in the contact-map prediction and subsequent ab initio folding simulations. Of interesting note, PF15461, which has a majority of members coming from ocean-related bacteria, is identified as an important photosynthetic protein by structure-based function annotations. The pipeline is extended to a set of 417 Pfam families, built on the combination of Tara with other metagenomics datasets, which results in 235 families with an estimated TM-score over 0.5. Conclusions These results demonstrate a new avenue to improve the capacity of protein structure and function modeling through marine metagenomics, especially for difficult proteins with few homologous sequences. | |
dc.title | Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families | |
dc.type | Article | en_US |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/152163/1/13059_2019_Article_1823.pdf | |
dc.language.rfc3066 | en | |
dc.rights.holder | The Author(s). | |
dc.date.updated | 2019-11-22T13:53:14Z | |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.