Deep Generative Models for Single-Cell Perturbation Experiments
Yu, Hengshi
2022
Abstract
Recent developments in deep learning have enabled generation of novel and realistic images or sentences from low-dimensional representations. In addition, a revolution in biotechnology has enabled high-throughput measurement of gene expression in thousands to millions of single cells. Several deep generative models have been developed to learn latent representations of cells and generate realistic high-dimensional single-cell data. However, these deep generative models primarily generate data similar to that seen during training and have limited ability to predict gene expression of unseen cell states. In consequence, it constrains the applicability of deep generative models for single-cell data, which usually have a relatively small set of observed conditions. Therefore, this dissertation aims to develop flexible and accurate deep generative models for single-cell data that learn representations characterizing how cells respond to various perturbation conditions and predict unobserved cell states. In Chapter II, we study two main classes of deep generative models for single-cell RNA-seq data: variational autoencoders (VAEs) and generative adversarial networks (GANs). We systematically assess their disentanglement and generation performance and show that VAEs excel at learning cellular representations, while GANs excel at generating realistic single-cell gene expression data. We also develop MichiGAN, a novel neural network architecture that combines the strengths of VAEs and GANs to sample from disentangled representations without sacrificing data generation quality. We learn disentangled representations of three large single-cell RNA-seq datasets and use MichiGAN to sample from these representations to manipulate semantically distinct aspects of cellular identity and predict single-cell gene expression responses to drug treatments. In Chapter III, we develop PerturbNet, a novel deep generative model to generate single-cell data under unseen drug treatments. Existing approaches attempt to learn drug effects independently of cell state and cannot predict results for unseen drug treatments. To address these limitations, our PerturbNet framework learns mapping from a continuous representation of drug treatment to cellular states. PerturbNet can then generate single-cell data for both observed and unseen drug treatments. We show that PerturbNet accurately predicts single-cell RNA-seq data resulting from unseen drug treatments. We also fine-tune PerturbNet using cellular properties to improve the continuous representations of drug treatments. In Chapter IV, we extend PerturbNet to learn single-cell responses to genetic perturbations, including pooled CRISPR genetic inactivations and genetic mutations. Existing approaches attempt to learn genetic perturbation effects independently of cell state and rely on one-hot encodings of genetic perturbations. Although this type of representation allows different combinations of observed target genes to be learned, it cannot generalize to unseen target genes. We develop a GenotypeVAE model and also employ a state-of-the-art protein sequence embedding model to encode genetic perturbations into continuous representations, allowing prediction for both unseen genes and unseen gene combinations. In Chapter V, we extend PerturbNet to design optimal perturbations and attribute perturbation outcomes to specific perturbation features. We consider the translation of a group of cells to a target cell state, and propose two algorithms to design perturbations that achieve this desired target cell state. We show that the algorithms are effective at designing perturbations that achieve the cell state translation of interest. We also employ model interpretability methods to attribute the effects of chemical or genetic perturbations to specific atoms or gene functional annotations.Deep Blue DOI
Subjects
Cellular identity Deep generative models Single-cell data Drug perturbation CRISPR gene editing Optimal perturbation design
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.