Shrinkage Methods for High-Dimensional Regression and Mediation Models
Boss, Jonathan
2023
Abstract
With advances in high throughput assaying technology, environmental health scientists are increasingly interested in characterizing the joint effects of a set of exogenous environmental exposures (A) coupled with endogenous omics data on a health outcome (Y). The omics data is often treated as a high-dimensional mediator (M), representing potential intermediary pathways through which A yields Y. This allows for a deeper understanding of how exposure mixtures impact health outcomes and what endogenous biological mechanisms underlie mixture effects. In this dissertation, we develop association and mediation models that are specifically tailored to the structure of correlated environmental exposure mixtures and high-dimensional omics data. In the first project, we focus on the problem of regression coefficient estimation in multi-pollutant models where areas of high collinearity in the exposure space are contained within known covariate groupings called exposure classes. To assuage variance inflation induced by correlated exposures, we propose the group inverse-gamma gamma (GIGG) prior, a heavy-tailed prior that can trade-off between local and group shrinkage in a data-adaptive fashion. Compared to a benchmark shrinkage method like horseshoe regression, GIGG regression reduces mean-squared error by at least 30% across a range of correlation structures and within-group signal densities. We apply GIGG regression to data from the National Health and Nutrition Examination Survey, identifying a toxic effect of metal mixtures on gamma-glutamyl transferase. For a widely studied environmental exposure, there is likely literature establishing statistical and biological significance of the total exposure effect (TE), defined as the effect of A on Y given a set of confounders C. In the second project, we show that leveraging external summary-level information on the TE can improve estimation efficiency of the mediation effects for linear mediation models. Moreover, the efficiency gain depends on the partial r-squared between the (Y|M,A,C) and (Y|A,C) models, with smaller (larger) values benefiting direct (indirect) effect estimation. We then robustify our base estimation procedure (Mediation with External Summary Statistic information or MESSI) to incongenial external information. In the highly congenial simulation scenarios, we observe relative efficiency gains for mediation effect estimation of up to 40%. We illustrate our methodology using data from the Puerto Rico Testsite for Exploring Contamination Threats (PROTECT), where Cytochrome p450 lipid metabolites are hypothesized to mediate the effect of phthalate exposure on gestational age at delivery. External information on the TE comes from a recently published pooled analysis of 16 studies. The third project considers the problem of estimating mediation effects with respect to exposure mixtures. We develop a method called the mediation mixture map (MedMix), which combines ideas from mediation analysis and latent factor modeling to simultaneously estimate mediation effects corresponding to changes in individual exposures and latent sources of exposure variation. In some simulation settings, MedMix leads to a substantial reduction in root mean-squared error for estimating the mixture mediation effect (approximately 30%) and better quantifies model uncertainty compared to a naïve two-step estimator. We apply MedMix to PROTECT and identify a common source of variation corresponding to mono(carboxynonyl) (MCNP), mono(carboxyoctyl) (MCOP), and mono(3-carboxypropyl) (MCPP) phthalate exposure that is associated with shorter gestational age at delivery (1.13 day decrease per interquartile range increase in the latent mixture; 95% Credible Interval (CI): 0.01, 2.23) and smaller head circumference z-score (0.15 standard deviations smaller head circumference per interquartile range increase in the latent mixture; 95% CI: 0.03, 0.28).Deep Blue DOI
Subjects
Exposure Mixtures Mediation Analysis Shrinkage Estimators Data Integration Latent Factor Analysis Grouped Regressors
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.