Computational Methods for Single-Cell and Spatial Multimodal Data Integration

Gao, Chao

Computational Methods for Single-Cell and Spatial Multimodal Data Integration

dc.contributor.author	Gao, Chao
dc.date.accessioned	2024-05-22T17:29:33Z
dc.date.available	2024-05-22T17:29:33Z
dc.date.issued	2024
dc.date.submitted	2024
dc.identifier.uri	https://hdl.handle.net/2027.42/193464
dc.description.abstract	Advancements in sequencing technologies have revolutionized our ability to measure biomolecules. Single-cell single-omics sequencing allows for the examination of genome, transcriptome, epigenome at unprecedented resolution, providing a detailed view of cellular diversity and function. Furthermore, it addressed the limitations of bulk RNA sequencing that only profiles averaged gene expression across cells, masking the cellular heterogeneities. Following this, single-cell multimodal omics enables simultaneous analysis of multiple types of molecular measurements in the same cell. Such paired information has revealed genetic and epigenetic landscapes as well as their relationships. Further, spatial sequencing technologies provide molecular measurements with localization within tissues, adding an essential dimension to our understanding of biological complexity. They have assisted our research about how cells interact within spatial context, crucial for comprehending tissue organization, development, and disease pathology. In this dissertation, I propose three computational methods to address the challenges posed by each of these data types for identifying the heterogeneities within cell populations and tissue regions, advancing our knowledge of biological systems. Integrating diverse single-cell unimodal datasets offers tremendous opportunities for unbiased, comprehensive, quantitative definition of cell identities. The published single-cell data integration approaches are not designed for integration of multiple modalities or not scalable to massive datasets. None of these methods can incorporate new data without recalculating from scratch. To this end, I develop an online learning algorithm to solve the integrative nonnegative matrix factorization (Online iNMF). For cell type inference, I apply Online iNMF to integrate large-scale, continually arriving single-cell datasets of diverse molecular modalities, including gene expression, chromatin accessibility, and DNA methylation. Online iNMF converges rapidly and decouples the peak memory usage from the size of the entire dataset. Online iNMF shows that the improved computational efficiency is not at the cost of dataset alignment and cluster preservation performance. Online iNMF’s ability to iteratively incorporate data is useful in building single-cell multi-omic atlases. Single-cell multimodal epigenomic profiling simultaneously measures multiple histone modifications and chromatin accessibility in the same cells. Such parallel measurements provide opportunities to investigate how epigenomic modalities vary together across cell populations. I propose ConvNet-VAE, a variational autoencoder comprising one-dimensional convolutional layers, for dimensionality reduction. After window-based genome binning, ConvNet-VAE leverages the multi-track and sequential nature of these data. I apply ConvNet-VAE to integrate histone modification marks and chromatin accessibility profiled from juvenile mouse brain and human bone marrow. Compared to multimodal VAEs with only fully connected layers, ConvNet-VAE can achieve better performance in dimensionality reduction and batch correction, while using significantly fewer parameters. The advantage of ConvNet-VAE increases with the number of modalities, making it a promising tool as the number of jointly profiled epigenomic modalities grows. Multimodal spatial profiling has allowed for the simultaneous investigation of transcriptomics, proteomics, and epigenomics at the individual cell/bead/spot level in the tissue. I devise spaMVGAE, a multimodal variational autoencoder employing graph convolutional networks. By incorporating spatial location information, spaMVGAE adapts to various modalities and learns a joint low-dimensional embedding of cells/beads/spots for domain detection. I apply spaMVGAE to spatially resolved multimodal datasets from different biological contexts, such as breast cancer, mouse bone development, and adult mouse brain. spaMVGAE accurately detects regions of interest by capturing the heterogeneous and complex molecular makeup of the cells or tissue microenvironments. spaMVGAE scales to large datasets and carries out joint integration across multiple tissue sections.
dc.language.iso	en_US
dc.subject	Single-Cell and Spatial Multimodal Data Integration
dc.subject	Machine Learning
dc.title	Computational Methods for Single-Cell and Spatial Multimodal Data Integration
dc.type	Thesis
dc.description.thesisdegreename	PhD
dc.description.thesisdegreediscipline	Bioinformatics
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Welch, Joshua
dc.contributor.committeemember	Zhou, Xiang
dc.contributor.committeemember	Li, Jun
dc.contributor.committeemember	Liu, Jie
dc.contributor.committeemember	Najarian, Kayvan
dc.subject.hlbsecondlevel	Science (General)
dc.subject.hlbtoplevel	Science
dc.contributor.affiliationumcampus	Ann Arbor
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/193464/1/gchao_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/23109
dc.identifier.orcid	0000-0002-9316-2185
dc.identifier.name-orcid	Gao, Chao; 0000-0002-9316-2185	en_US
dc.working.doi	10.7302/23109	en
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: gchao_1.pdf
Size:: 47.13MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.