Show simple item record

Deciphering the Knowledge of Human Genome with Graphs

dc.contributor.authorFeng, Fan
dc.date.accessioned2024-05-22T17:29:13Z
dc.date.available2024-05-22T17:29:13Z
dc.date.issued2024
dc.date.submitted2024
dc.identifier.urihttps://hdl.handle.net/2027.42/193456
dc.description.abstractTranscriptional regulation in human cells is a complex process that requires the collaboration of diverse genomic elements and chemicals. To understand the mechanisms, projects including the Encyclopedia of DNA Elements (ENCODE), Roadmap Epigenomics, and 4D Nucleome (4DN) have generated thousands of genomic and epigenomic datasets. These datasets annotated functional elements for the human genome (e.g., enhancers and promoters), summarized experimental results for epigenomic features (e.g., protein binding locations), and linked different modalities with statistical models (e.g., GWAS and eQTLs). From the available data, it has become apparent that the human genome should not be over-simplified as a 1-D linear sequence. Long-range dependencies on DNA sequences play a vital role in human transcriptional regulation. For example, enhancers, the primary units of gene expression regulation, often reside hundreds of kilobases away from their target genes. Enhancers engage in physical interactions with target genes across vast genomic distances to activate them. Therefore, interpreting the human genome requires a more advanced data structure capable of capturing long-distance and complicated relationships. This dissertation discusses how to decipher the human genome as a graph. Graphs, composed of nodes (or vertices) and edges, provide a powerful framework for modeling relationships. Graphs have been proven effective in representing relationships in diverse real-world scenarios, such as social networks, transportation systems, and communication networks. In the subsequent chapters, the representations of the human genome as a graph will be introduced and explored. Chapter 2 introduces the application of chromosome conformation capture (3C) technology, which unveils physical interactions among genomic regions. Analyzing the large-scale contact maps generated by 3C technology is instrumental in uncovering the long-range dependencies of genomic entities and understanding transcriptional regulation. Therefore, we developed computational tools including scHiCTools and Quagga to extract structural features from these maps. In Chapter 3, we addressed the importance of high-resolution and high-quality chromatin contact maps. Therefore, we developed a computational model, CAESAR, to connect epigenomics and high-resolution chromatin structure. CAESAR successfully imputes an unprecedented number of high-resolution human chromatin contact maps, which allows users to easily navigate these fine-scale chromatin structures and the corresponding regulatory mechanisms. Beyond 3D interactions, numerous data consortia and databases unveil the characteristics of genomic entities and their relationships. Despite the invaluable insights provided by these consortia, the separately stored tabular data remain in a 1D sequential framework, posing inconveniences for genomic research and scientific discoveries. To address this challenge, we introduce the Genomic Knowledgebase (GenomicKB) in Chapter 4. GenomicKB is a knowledge graph that seamlessly integrates datasets and annotations related to the human genome into a knowledge graph. Through a graph-based interpretation of the human genome, we anticipate that genomic research will increasingly become data-driven. GenomicKB aims to provide high-quality and integrated data for large-scale machine learning methods, thereby facilitating scientific discoveries.
dc.language.isoen_US
dc.subjectdeep learning
dc.subjectknowledge graph
dc.subjecthuman genomics
dc.subject3D genomics
dc.titleDeciphering the Knowledge of Human Genome with Graphs
dc.typeThesis
dc.description.thesisdegreenamePhD
dc.description.thesisdegreedisciplineBioinformatics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberLiu, Jie
dc.contributor.committeememberZhang, Jianzhi
dc.contributor.committeememberParker, Stephen CJ
dc.contributor.committeememberWelch, Joshua
dc.contributor.committeememberZhang, Xiaotian
dc.subject.hlbsecondlevelGenetics
dc.subject.hlbtoplevelScience
dc.contributor.affiliationumcampusAnn Arbor
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/193456/1/fanfeng_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/23101
dc.identifier.orcid0000-0002-5990-312X
dc.identifier.name-orcidFeng, Fan; 0000-0002-5990-312Xen_US
dc.working.doi10.7302/23101en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.