Improvements and Developments in Gene Regulation and Single-Cell Gene Expression Data Analysis
Lee, Christopher
2020
Abstract
Recent advancements in high-throughput sequencing technologies have led to a vast amount of data assessing genome-wide regulation and single cell transcriptomics, which aid in the study of tissue complexities and heterogeneity at the molecular level among cells. These data provide opportunities for new insights into intracellular genetic mechanisms and require the development of new methods. We first improve and extend methods for pathway analysis for large sets of genomic regions, such as those derived from genome-wide transcription factor (ChIP-seq) or open chromatin (e.g. ATAC-seq) experiments. Starting with ChIP-Enrich, a previously-developed gene set enrichment method, we improve the runtime by using an approximation to reduce redundant calculations. We then created Poly-Enrich, an extension of ChIP-Enrich to allow for count and weighted count outcomes per gene (number of genomic regions) as opposed to only binary outcomes. Comparing the results from ChIP to Poly-Enrich, we discover patterns in gene regulation based on transcription factor and gene functionality, to be used in predicting the more powerful method. Furthermore, we introduce a hybrid test to combine the two methods when the prediction of the more powerful method is ambiguous. Using Poly-Enrich, we evaluate several ways of defining enhancer locations and assigning them to target genes and found several accuracy improvements to distal regulation analyses. Second, to complement gene set enrichment tests that do not account for relative peak locations, we developed a new method, proxReg, that tests whether experimentally-identified genomic regions that are associated with a specific biological function tend to be farther or closer to regulatory regions, either gene transcription start sites or enhancer regions. Using proxReg alone, we find that transcription factors such as NRSF can bind closer to either regulatory region depending on the regulated biological processes. Complementing Poly-Enrich, proxReg provides additional insight into how a pathway is regulated, as well as providing additional evidence for its significance. For the third project, we extend an existing method, PQLseq, which tests differential expression in bulk RNA-seq data with a mixed effects generalized linear model estimated by maximizing a penalized quasilikelihood, to the single cell RNA-seq (scRNA-seq) setting. In particular, we extend PQLseq to allow it to be used in hierarchical modeling structures with scRNA-seq data. A hierarchical data structure is very common in single cell studies. However, many existing scRNA-seq differential expression methods do not take this into account. Differential expression methods that can model hierarchical data structures are often computationally slow due to the handling of very large matrices in the data. Here, we take advantage of several properties of the hierarchical structure of single cell RNA-seq data and develop a new algorithm that allows for faster differential expression analysis. Our method specifically takes advantage of the natural low-rank structure in the data, saving several magnitudes of calculation time. With extensive simulations, we show that our method provides well controlled type I error and can provide higher power without astronomical computation times.Subjects
Regulatory Genomics
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.