Statistical Methods To Investigate Asymmetric Association and Directionality In Biomedical Studies
Purkayastha, Soumik
2024
Abstract
Association and directionality of association are two key concepts in many statistical analyses. While statistical association is routinely used to describe the strength and nature of a mutual or exchangeable relationship between two variables, it does not have the capacity to capture asymmetric relations. Directionality indicates an order or asymmetry in exposure-outcome relationships. This dissertation develops new statistical methods to quantify and infer association directionality using Shannon’s information-theoretic framework. Such a novel statistical toolkit helps understand the manifestation of causal relationships and identify potential colliders, in biomedical studies. Further, in proposed future work, we outline how this toolkit may further be used to detect mediators and instrumental variables in biomedical data. Viewed together, this interdisciplinary work not only enhances our understanding of causality but also provides practical tools for researchers to identify and confirm directional relationships in complex data structures. All analytic work is supported by R or Python software for broader dissemination. The first project focuses on the development of a fast nonparametric density estimation technique essential to calculate a sample counterpart of Shannon’s mutual information (MI) and associated uncertainty. Such computational efficiency helps deal with many pairwise MI statistics needed in practice. Most existing nonparametric estimators of MI have unstable statistical performance since they involve parameter tuning. I propose a consistent and computationally efficient estimator, called fastMI, that does not incur any parameter tuning. Based on a copula formulation, fastMI estimates the MI statistic by leveraging Fast Fourier transform-based estimation of the underlying density. Extensive simulation studies reveal that fastMI outperforms state-of-the-art estimators with improved estimation accuracy and reduced runtime for large data sets. This estimation analytic paves the path to address the subsequent technical needs in the dissertation. The second project develops a novel approach to understanding asymmetry in exposure-outcome pairs, offering both theoretical and empirical insights into underlying directionality, which may be deemed as a weak imprint of Neyman-Rubin's causality with minimal conditions. Using Shannon's seminal information theory, I propose a new statistical framework when a priori assumptions about the relative ordering of exposure and outcome are unavailable. Our methodology development is motivated by the Early Life Exposures in Mexico to Environmental Toxicants (ELEMENT) cohort study. With limited knowledge, scientists are uncertain if changes in DNA methylation (DNAm) biomarkers cause changes in cardiovascular behavior, or vice-versa. I explore such asymmetric relations in exposure-outcome pairs in the paradigm of Generative Exposure Mapping (GEM). I propose a novel coefficient of asymmetry that provides a computationally manageable framework for inferring asymmetry with minimal conditions. The major contributions include broadening existing information geometric principles for examining asymmetry and introducing statistical inference to quantify uncertainty in determining underlying directionality. The third project concerns detection of colliders in mechanistic pathways. A collider is known to be a troublemaker to obliterate legitimate parent-offspring orders in a directed graph model. I propose a conceptual framework called Conditional Generative Exposure Mapping (CGEM) that utilizes a conditional entropy-based asymmetry analytic for detecting colliders. The methodology development is motivated by a need to identify colliders when examining mechanistic pathways between blood pressure gene DNAm, which can help explain the biological networks involved in epigenetic regulation of blood pressure. I use nonparametric density estimation and kernel smoothing methods to estimate the analytic and study its performance through simulation studies and applications related to the ELEMENT data.Deep Blue DOI
Subjects
asymmetry association epigenetics collider mediator entropy
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.