Bayesian Learning of Structured Covariances, with Applications to Cancer Data
Yao, Tsung-Hung
2023
Abstract
The identification of scientifically-driven dependence structures is of interest across many biomedical domains. Examples include tree- and graph-based structures manifesting themselves in precision medicine and genomic contexts. Such dependence structures can be compactly represented as covariance or precision matrices, which are useful for characterizing and interpreting complex relationships. This dissertation develops a family of Bayesian models for structured covariances to investigate the biological dependencies, motivated by two applications in cancer research. These models are derived to adapt to different biological dependencies, such as the tree structure for assessing treatment similarity in pre-clinical cancer models and robust network structures for proteogenomics data incorporating tumor heterogeneity. In Chapter 2, I propose a novel Bayesian probabilistic tree-based framework for patient-derived xenograft data to investigate the hierarchical relationships between treatments by inferring treatment trees. This framework motivates a new metric of mechanistic similarity between two or more treatments accounting for inherent uncertainty in tree estimation. Building upon Dirichlet Diffusion Trees, I derive a closed-form marginal likelihood encoding the tree structure, which facilitates computationally efficient posterior inference via a new two-stage algorithm. Simulation studies demonstrate superior performance of the proposed method in recovering the tree structure and treatment similarities. My analyses of a recently collated PDX dataset produce treatment similarity estimates that are concordant with known biological mechanisms across treatments and uncover potential combination therapies for future clinical investigations. Chapter 3 extends the work of the tree structure and the corresponding ultrametric matrices in Chapter 2. These matrices are an important class of matrices in statistics with numerous applications. Although projection- and relaxation-based estimation methods exist, no inferential techniques provide appropriate uncertainty quantifications. The primary challenges lie in its non-trivial geometry induced by tree-structured spaces. I propose a novel consistent Markovian fragmentation prior over ultrametric matrices based on Nabben-Varga decomposition. The decomposition admits one-to-one mapping of ultrametric matrices to rooted trees, enabling the inference in the surrogate space of rooted trees. The metricized tree space naturally motivates quick local moves along geodesics between neighboring tree topologies and admits existing posterior summaries in the tree space. Simulation studies show that the proposed algorithm accurately recovers the matrix and the tree along with uncertainty quantification. I demonstrate the utility of the proposed method on the pre-clinical dataset by constructing the treatment tree and the mechanism similarity for multiple cancer treatments. In Chapter 4, I focus on Graphical models and investigate complex dependency structures in high-throughput datasets. Currently, most existing graphical models make one of two canonical assumptions: (i) a common network for all subjects or (ii) the normality assumption under the context of Gaussian graphical models. Both assumptions fail in certain applications, such as the proteomic networks in cancer. I propose robust Bayesian graphical regression (rBGR) to estimate heterogeneous graphs for non-normally distributed data. rBGR allows a flexible framework to estimate graphs by accommodating the non-normality through the random marginal transformations and constructs covariate-dependent graphs through graphical regression techniques. I formulate a new characterization of edge dependencies in such models called conditional sign independence with covariates. Simulation studies demonstrate that rBGR outperforms existing Gaussian graphical regression for data generated under various levels of non-normality in both edge and covariate selection. I use rBGR to assess proteomic networks to investigate the immunogenic heterogeneity within tumors. Some corroborate existing biological knowledge but also discover novel associations for future investigations.Deep Blue DOI
Subjects
Bayesian method Structured covariance Tree-structured covariance Ultrametric matrix Covariate-dependent graph
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.