Models and Inference for Complex Data with Applications in Nuclear Non-Proliferation and Microbial Systems

Zhu, Haonan

Models and Inference for Complex Data with Applications in Nuclear Non-Proliferation and Microbial Systems

Zhu, Haonan

2023

View/Open

haonan_1.pdf

(6.9MB

PDF)

Abstract

With recent advances in science and technology, researchers are often provided with unprecedented amounts of complex data to analyze. The structure of the data, due to being high dimensional, discrete, incomplete, extrapolating meaningful information from these data requires models that incorporate knowledge about the underlying systems and implement efficient computation methods. In this thesis, we have developed statistical models and inference algorithms (using Monte Carlo methods and optimization methods) comprehensively undertaken their performance analysis for several complex problem domains. These domains include: inverse problems in radiation detection; data fusion and classification in high dimensional microbiome studies; and contingency table analysis for inferring voting patterns in election polling data with missing information. Specifically, Chapter II describes a hierarchical Bayesian model and state-of-art Monte Carlo sampling method to solve the unfolding problem, i.e., to estimate the spectrum of an unknown neutron source from the data detected by an organic scintillator. The proposed approach is compared to three existing methods using simulated data to enable controlled benchmarks. Our results show that the proposed method has competitive unfolding performance compared to existing approaches in terms of accuracy and robustness against limited detection events, while requiring less user supervision. The proposed method also provides additional posterior confidence measures. Chapter III develops a Bayesian graphical model for fusing disparate types of count data. The motivating application is the study of bacterial communities from diverse high-dimensional features. We introduce a flexible multinomial-Gaussian generative model for jointly modeling such count data. We present a computationally scalable variational Expectation-Maximization (EM) algorithm for inferring the latent variables and the parameters of the model. The inferred latent variables provide a common dimensionality reduction for visualizing the data. In addition to simulation studies that demonstrate the variational EM procedure, we apply our model to a bacterial microbiome dataset. Chapter IV proposes a hierarchical Bayesian multitask learning model that is applicable to the general multitask binary classification learning problem where the model assumes a shared sparsity structure across different tasks. We derive a computational efficient inference algorithm based on variational inference to approximate the posterior distribution. We demonstrate promises of the new approach on multiple synthetic datasets and a real world microbiome dataset in comparison with other benchmark methods. Chapter V introduces an exact model with minimal assumptions for the transition matrix recovery problem, where we are given multiple two-way contingency tables with known margin sums but missing inner cells. We propose three valid approximations of the exact model and a novel Riemannian gradient algorithm to obtain the Maximum Likelihood Estimators (MLE) of the transition matrix. The proposed methods are applied to a synthetic dataset and a real world dataset from the New Zealand general election. Our simulation studies show the scope when those approximations apply. A further clustering analysis using the estimated stochastic matrices across different electorate districts is able to identify communities that are reflective of the demographics of New Zealand.

Deep Blue DOI

https://dx.doi.org/10.7302/22287

Subjects

Statistical Modeling

Inference

Monte-Carlo Method

Variational Inference

Neutron Unfolding

Microbiome Profiling

Types

Thesis

Handle

https://hdl.handle.net/2027.42/192378

Metadata

Show full item record

Collections

Dissertations and Theses (Ph.D. and Master's)

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.