Decoding Regulatory Variants With Computational Methods in Non-coding Regions of the Human Genome

Zhao, Nanxiang

Decoding Regulatory Variants With Computational Methods in Non-coding Regions of the Human Genome

dc.contributor.author	Zhao, Nanxiang
dc.date.accessioned	2023-09-22T15:35:21Z
dc.date.available	2023-09-22T15:35:21Z
dc.date.issued	2023
dc.date.submitted	2023
dc.identifier.uri	https://hdl.handle.net/2027.42/177989
dc.description.abstract	Understanding the functional consequences of regulatory variants is a significant challenge in genomics. Although Genome-Wide Association Studies (GWAS) have provided valuable insights into human phenotypes by identifying genetic variations associated with diseases and complex traits, the functional implications of many of these genetic variants remain unknown, particularly for non-coding regions of the human genome, which account for over 90% of all variants. To address this challenge, my dissertation focuses on functionally characterizing regulatory elements and their variants in the human genome. Specifically, I define regulatory variants as single nucleotide polymorphisms (SNPs) that can modify the binding affinities of transcription factors (TFs) within the regulatory elements. Such alterations can impact downstream gene expression and potentially contribute to disease progression and trait development. However, characterizing regulatory variants has traditionally relied on the laborious experimental dissection of the human genome, often confined to specific cell types or tissues, thus making it unfeasible to examine all relevant variants in their appropriate biological context. The advent of high-throughput sequencing and computation methods has substantially accelerated the discovery process. In my dissertation, I have developed a series of computational tools and methods to end-to-end characterize regulatory elements and their variants (Fig 6.1). In Chapter II, I developed a peak calling software, F-Seq2, to accurately define regulatory element regions from open chromatin assays and ChIP-seq assays. F-Seq2 utilized kernel density estimation and a dynamic "continuous" Poisson test to account for local biases, outperforming state-of-the-art software including MACS2 in terms of precision and recall. Accurate peak calling is essential for downstream analysis, such as differential binding or motif analysis, and lays the foundation for the functional characterization of regulatory variants. In Chapter III, I advanced a leading regulatory variants database, RegulomeDB, to its second version. RegulomeDB allows users to query variants and obtain a comprehensive list of functional evidence for their variants of interest. The new version of RegulomeDB contains over five times more data than its previous version, providing an even more comprehensive resource for researchers. Additionally, the introduction of a suite of scoring models, namely SURF and TURF, enables accurate summaries of the likelihood that variants function as regulatory variants based on all available evidence. In Chapter IV, I developed a machine learning model, TLand, as the next version of the RegulomeDB scoring model, to annotate and prioritize regulatory variants in an organ-specific manner. TLand takes advantage of RegulomeDB-derived features and builds a flexible architecture using stacked generalization to reduce overfitting and facilitate future continuous learning. TLand outperformed state-of-the-art models when holding out cell lines or organ allele-specific binding data. By accounting for common data availability issues that often exist in sequence-based deep learning models, TLand accurately prioritized the relevant organs for approximately 2 million GWAS SNPs. In Chapter V, I introduced a pipeline, Explain-seq, to automatically train and interpret sequence-based deep learning models given genomic coordinates. I demonstrated the utility of Explain-seq by applying it to a recent STARR-seq dataset to gain insights into enhancer binding patterns in a cell-specific manner. The pipeline identified both known and de novo motifs in the K562 cell line by comparing them to the JASPAR database. Overall, the computational methods and tools that I developed throughout my dissertation can aid in the discovery and characterization of regulatory elements and variants in the non-coding regions of the human genome.
dc.language.iso	en_US
dc.subject	Bioinformatics
dc.subject	Genomics
dc.subject	Machine learning
dc.subject	Software development
dc.title	Decoding Regulatory Variants With Computational Methods in Non-coding Regions of the Human Genome
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Bioinformatics
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Boyle, Alan P
dc.contributor.committeemember	Derksen, Harm
dc.contributor.committeemember	Kitzman, Jacob
dc.contributor.committeemember	Najarian, Kayvan
dc.contributor.committeemember	Welch, Joshua
dc.subject.hlbsecondlevel	Genetics
dc.subject.hlbtoplevel	Science
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/177989/1/samzhao_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/8446
dc.identifier.orcid	0000-0003-3124-0958
dc.identifier.name-orcid	Zhao, Nanxiang; 0000-0003-3124-0958	en_US
dc.working.doi	10.7302/8446	en
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: samzhao_1.pdf
Size:: 43.62MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.