Show simple item record

Computational Methods for Single-Cell and Spatial Multimodal Data Integration

dc.contributor.authorGao, Chao
dc.date.accessioned2024-05-22T17:29:33Z
dc.date.available2024-05-22T17:29:33Z
dc.date.issued2024
dc.date.submitted2024
dc.identifier.urihttps://hdl.handle.net/2027.42/193464
dc.description.abstractAdvancements in sequencing technologies have revolutionized our ability to measure biomolecules. Single-cell single-omics sequencing allows for the examination of genome, transcriptome, epigenome at unprecedented resolution, providing a detailed view of cellular diversity and function. Furthermore, it addressed the limitations of bulk RNA sequencing that only profiles averaged gene expression across cells, masking the cellular heterogeneities. Following this, single-cell multimodal omics enables simultaneous analysis of multiple types of molecular measurements in the same cell. Such paired information has revealed genetic and epigenetic landscapes as well as their relationships. Further, spatial sequencing technologies provide molecular measurements with localization within tissues, adding an essential dimension to our understanding of biological complexity. They have assisted our research about how cells interact within spatial context, crucial for comprehending tissue organization, development, and disease pathology. In this dissertation, I propose three computational methods to address the challenges posed by each of these data types for identifying the heterogeneities within cell populations and tissue regions, advancing our knowledge of biological systems. Integrating diverse single-cell unimodal datasets offers tremendous opportunities for unbiased, comprehensive, quantitative definition of cell identities. The published single-cell data integration approaches are not designed for integration of multiple modalities or not scalable to massive datasets. None of these methods can incorporate new data without recalculating from scratch. To this end, I develop an online learning algorithm to solve the integrative nonnegative matrix factorization (Online iNMF). For cell type inference, I apply Online iNMF to integrate large-scale, continually arriving single-cell datasets of diverse molecular modalities, including gene expression, chromatin accessibility, and DNA methylation. Online iNMF converges rapidly and decouples the peak memory usage from the size of the entire dataset. Online iNMF shows that the improved computational efficiency is not at the cost of dataset alignment and cluster preservation performance. Online iNMF’s ability to iteratively incorporate data is useful in building single-cell multi-omic atlases. Single-cell multimodal epigenomic profiling simultaneously measures multiple histone modifications and chromatin accessibility in the same cells. Such parallel measurements provide opportunities to investigate how epigenomic modalities vary together across cell populations. I propose ConvNet-VAE, a variational autoencoder comprising one-dimensional convolutional layers, for dimensionality reduction. After window-based genome binning, ConvNet-VAE leverages the multi-track and sequential nature of these data. I apply ConvNet-VAE to integrate histone modification marks and chromatin accessibility profiled from juvenile mouse brain and human bone marrow. Compared to multimodal VAEs with only fully connected layers, ConvNet-VAE can achieve better performance in dimensionality reduction and batch correction, while using significantly fewer parameters. The advantage of ConvNet-VAE increases with the number of modalities, making it a promising tool as the number of jointly profiled epigenomic modalities grows. Multimodal spatial profiling has allowed for the simultaneous investigation of transcriptomics, proteomics, and epigenomics at the individual cell/bead/spot level in the tissue. I devise spaMVGAE, a multimodal variational autoencoder employing graph convolutional networks. By incorporating spatial location information, spaMVGAE adapts to various modalities and learns a joint low-dimensional embedding of cells/beads/spots for domain detection. I apply spaMVGAE to spatially resolved multimodal datasets from different biological contexts, such as breast cancer, mouse bone development, and adult mouse brain. spaMVGAE accurately detects regions of interest by capturing the heterogeneous and complex molecular makeup of the cells or tissue microenvironments. spaMVGAE scales to large datasets and carries out joint integration across multiple tissue sections.
dc.language.isoen_US
dc.subjectSingle-Cell and Spatial Multimodal Data Integration
dc.subjectMachine Learning
dc.titleComputational Methods for Single-Cell and Spatial Multimodal Data Integration
dc.typeThesis
dc.description.thesisdegreenamePhD
dc.description.thesisdegreedisciplineBioinformatics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberWelch, Joshua
dc.contributor.committeememberZhou, Xiang
dc.contributor.committeememberLi, Jun
dc.contributor.committeememberLiu, Jie
dc.contributor.committeememberNajarian, Kayvan
dc.subject.hlbsecondlevelScience (General)
dc.subject.hlbtoplevelScience
dc.contributor.affiliationumcampusAnn Arbor
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/193464/1/gchao_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/23109
dc.identifier.orcid0000-0002-9316-2185
dc.identifier.name-orcidGao, Chao; 0000-0002-9316-2185en_US
dc.working.doi10.7302/23109en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.