Beyond the Transcriptome: Facilitating Interpretation of Epigenomics and Metabolomics Data
Cavalcante Jr, Raymond
2017
Abstract
High-throughput omics experiments produce an incredible amount of data which must be put into context to make it useful. This is true of transcriptomics assays, epigenomics assays such as those measuring transcription factor binding and histone modifications (e.g. ChIP-seq) or those measuring DNA methylation (e.g. WGBS and RRBS), as well as for metabolomics assays quantifying small molecules (e.g. LC-MS). The field of transcriptomics, having been developed earlier than epigenomics and metabolomics, benefits from more, and more mature, interpretive tools. The primary goal of this dissertation is to develop software tools to interpret epigenomics and metabolomics data. First, we developed Broad-Enrich, a gene set enrichment tool designed for histone modification ChIP-seq data and other broad genomic regions. We employ a logistic regression model with a smoothing spline to account for the relationship between the proportion of a gene covered by a peak and a gene's length. We demonstrate Broad-Enrich has correct Type I error across 55 ENCODE HM datasets, that Broad-Enrich returns more biologically relevant results than other approaches, and that the correct choice of gene locus definition improves the strength of enrichments. Second, we developed ConceptMetab, an interactive web-based tool that maps and explores the relationships among biologically-defined metabolite sets developed from Gene Ontology, KEGG Pathways, and Medical Subject Headings, and based on statistical tests for association. We demonstrate the utility of ConceptMetab with multiple vignettes, showing it can be used to identify known and potentially novel relationships among metabolic pathways, cellular processes, phenotypes, and diseases, and provides an intuitive interface for linking compounds to their molecular functions and higher level biological effects. Third, we developed annotatr, a tool for annotating genomic regions to genomic annotations. The annotatr package reports all intersections of regions and annotations, giving a better understanding of the genomic context of the regions. A variety of functions are implemented to easily plot covariate data associated with the regions across the annotations, and across annotation intersections, providing insight into how characteristics of the regions differ across the genome. Fourth, we developed mint, a pipeline to analyze, integrate, and annotate DNA methylation (5mC) and hydroxymethylation data (5hmC). Current gold-standard methods for measuring 5mC also capture 5hmC signal, confounding biological conclusions. The mint pipeline separates the signals in silico to discern the effects of each epigenetic mark in the experiment under consideration. The pipeline supports group comparisons for general designs with covariate information, and data are integrated based upon overlapping signal of 5mC and 5hmC. Genomic annotations and summary visualizations are output at various stages to facilitate interpretation. In sum, this body of work establishes tools enabling the interpretation of epigenomics and metabolomics data via functional enrichment, genomic annotation, data integration, and visualization.Subjects
Epigenomics Metabolomics Functional interpretation Genomic annotation
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.