Models and Inference for Complex Data with Applications in Nuclear Non-Proliferation and Microbial Systems
Zhu, Haonan
2023
Abstract
With recent advances in science and technology, researchers are often provided with unprecedented amounts of complex data to analyze. The structure of the data, due to being high dimensional, discrete, incomplete, extrapolating meaningful information from these data requires models that incorporate knowledge about the underlying systems and implement efficient computation methods. In this thesis, we have developed statistical models and inference algorithms (using Monte Carlo methods and optimization methods) comprehensively undertaken their performance analysis for several complex problem domains. These domains include: inverse problems in radiation detection; data fusion and classification in high dimensional microbiome studies; and contingency table analysis for inferring voting patterns in election polling data with missing information. Specifically, Chapter II describes a hierarchical Bayesian model and state-of-art Monte Carlo sampling method to solve the unfolding problem, i.e., to estimate the spectrum of an unknown neutron source from the data detected by an organic scintillator. The proposed approach is compared to three existing methods using simulated data to enable controlled benchmarks. Our results show that the proposed method has competitive unfolding performance compared to existing approaches in terms of accuracy and robustness against limited detection events, while requiring less user supervision. The proposed method also provides additional posterior confidence measures. Chapter III develops a Bayesian graphical model for fusing disparate types of count data. The motivating application is the study of bacterial communities from diverse high-dimensional features. We introduce a flexible multinomial-Gaussian generative model for jointly modeling such count data. We present a computationally scalable variational Expectation-Maximization (EM) algorithm for inferring the latent variables and the parameters of the model. The inferred latent variables provide a common dimensionality reduction for visualizing the data. In addition to simulation studies that demonstrate the variational EM procedure, we apply our model to a bacterial microbiome dataset. Chapter IV proposes a hierarchical Bayesian multitask learning model that is applicable to the general multitask binary classification learning problem where the model assumes a shared sparsity structure across different tasks. We derive a computational efficient inference algorithm based on variational inference to approximate the posterior distribution. We demonstrate promises of the new approach on multiple synthetic datasets and a real world microbiome dataset in comparison with other benchmark methods. Chapter V introduces an exact model with minimal assumptions for the transition matrix recovery problem, where we are given multiple two-way contingency tables with known margin sums but missing inner cells. We propose three valid approximations of the exact model and a novel Riemannian gradient algorithm to obtain the Maximum Likelihood Estimators (MLE) of the transition matrix. The proposed methods are applied to a synthetic dataset and a real world dataset from the New Zealand general election. Our simulation studies show the scope when those approximations apply. A further clustering analysis using the estimated stochastic matrices across different electorate districts is able to identify communities that are reflective of the demographics of New Zealand.Deep Blue DOI
Subjects
Statistical Modeling Inference Monte-Carlo Method Variational Inference Neutron Unfolding Microbiome Profiling
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.