Show simple item record

Improvements and Developments in Gene Regulation and Single-Cell Gene Expression Data Analysis

dc.contributor.authorLee, Christopher
dc.date.accessioned2020-05-08T14:37:04Z
dc.date.availableNO_RESTRICTION
dc.date.available2020-05-08T14:37:04Z
dc.date.issued2020
dc.date.submitted2020
dc.identifier.urihttps://hdl.handle.net/2027.42/155218
dc.description.abstractRecent advancements in high-throughput sequencing technologies have led to a vast amount of data assessing genome-wide regulation and single cell transcriptomics, which aid in the study of tissue complexities and heterogeneity at the molecular level among cells. These data provide opportunities for new insights into intracellular genetic mechanisms and require the development of new methods. We first improve and extend methods for pathway analysis for large sets of genomic regions, such as those derived from genome-wide transcription factor (ChIP-seq) or open chromatin (e.g. ATAC-seq) experiments. Starting with ChIP-Enrich, a previously-developed gene set enrichment method, we improve the runtime by using an approximation to reduce redundant calculations. We then created Poly-Enrich, an extension of ChIP-Enrich to allow for count and weighted count outcomes per gene (number of genomic regions) as opposed to only binary outcomes. Comparing the results from ChIP to Poly-Enrich, we discover patterns in gene regulation based on transcription factor and gene functionality, to be used in predicting the more powerful method. Furthermore, we introduce a hybrid test to combine the two methods when the prediction of the more powerful method is ambiguous. Using Poly-Enrich, we evaluate several ways of defining enhancer locations and assigning them to target genes and found several accuracy improvements to distal regulation analyses. Second, to complement gene set enrichment tests that do not account for relative peak locations, we developed a new method, proxReg, that tests whether experimentally-identified genomic regions that are associated with a specific biological function tend to be farther or closer to regulatory regions, either gene transcription start sites or enhancer regions. Using proxReg alone, we find that transcription factors such as NRSF can bind closer to either regulatory region depending on the regulated biological processes. Complementing Poly-Enrich, proxReg provides additional insight into how a pathway is regulated, as well as providing additional evidence for its significance. For the third project, we extend an existing method, PQLseq, which tests differential expression in bulk RNA-seq data with a mixed effects generalized linear model estimated by maximizing a penalized quasilikelihood, to the single cell RNA-seq (scRNA-seq) setting. In particular, we extend PQLseq to allow it to be used in hierarchical modeling structures with scRNA-seq data. A hierarchical data structure is very common in single cell studies. However, many existing scRNA-seq differential expression methods do not take this into account. Differential expression methods that can model hierarchical data structures are often computationally slow due to the handling of very large matrices in the data. Here, we take advantage of several properties of the hierarchical structure of single cell RNA-seq data and develop a new algorithm that allows for faster differential expression analysis. Our method specifically takes advantage of the natural low-rank structure in the data, saving several magnitudes of calculation time. With extensive simulations, we show that our method provides well controlled type I error and can provide higher power without astronomical computation times.
dc.language.isoen_US
dc.subjectRegulatory Genomics
dc.titleImprovements and Developments in Gene Regulation and Single-Cell Gene Expression Data Analysis
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineBiostatistics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberSartor, Maureen
dc.contributor.committeememberZhou, Xiang
dc.contributor.committeememberBoyle, Alan P
dc.contributor.committeememberKang, Hyun Min
dc.contributor.committeememberScott, Laura Jean
dc.subject.hlbsecondlevelStatistics and Numeric Data
dc.subject.hlbtoplevelScience
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/155218/1/leetaiyi_1.pdf
dc.identifier.orcid0000-0002-8621-256X
dc.identifier.name-orcidLee, Christopher; 0000-0002-8621-256Xen_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.