Show simple item record

Novel Applications and Extensions for Bayesian Additive Regression Trees (BART) in Prediction, Imputation, and Causal Inference

dc.contributor.authorTan, Yaoyuan Vincent
dc.date.accessioned2019-02-07T17:54:50Z
dc.date.availableNO_RESTRICTION
dc.date.available2019-02-07T17:54:50Z
dc.date.issued2018
dc.date.submitted
dc.identifier.urihttps://hdl.handle.net/2027.42/147594
dc.description.abstractThe Bayesian additive regression trees (BART) is a method proposed by Chipman et al. (2010) that can handle non-linear main and multiple-way interaction effects for independent continuous or binary outcomes. It has enjoyed much success in areas like causal inference, economics, environmental sciences, and genomics. However, extensions of BART and application of these extensions are limited. This thesis discusses three novel applications and extensions for BART. We first discuss how BART can be extended to clustered outcomes by adding a random intercept. This work was motivated by the need to accurately predict driver behavior using observable speed and location information with application to communication of key human-driver intention to nearby vehicles in traffic. Although our extension can be considered a special case of the spatial BART (Zhang et al., 2007), our approach differs by providing a relatively simple algorithm that allows application to clustered binary outcomes. We next focus on the use of BART in missing data settings. Doubly robust (DR) methods allow consistent estimation of population means when either non-response propensity or modeling of the mean of the outcome is correctly specified. Kang and Schafer (2007) showed that DR methods produce biased and inefficient estimates when both propensity and mean models are misspecified. We consider the use of BART for modeling means and/or propensities to provide a ``robust-squared'' estimator that reduces bias and improves efficiency. We demonstrate this result, using simulations, for the two commonly used DR methods: Augmented Inverse Probability Weighting (AIPWT, Robbins et al., 1994) and penalized splines of propensity prediction (PSPP, Zhang and Little, 2009). We successfully applied our proposed model to two national crash datasets to impute missing change in deceleration values (delta-v) and missing Blood Alcohol Concentration (BAC) levels respectively. Our final effort considers how a negative wealth shock (sudden large decline in wealth) affects the cognitive outcome of late middle aged US adults using the Health Retirement Study, a longitudinal study of US adults, enrolled at age 50 and older and surveyed biennially since 1992. Our analysis faced three issues: lack of randomization, confounding by indication, and censoring of the cognitive outcome by a substantial number of deaths in our subjects. Marginal structural models (MSM), a commonly used method to deal with censoring by death, is arguably inappropriate because it upweights subjects who are more likely to die, creating a pseudo-population which resembles one where death is absent. We propose to compare the negative wealth shock effect only among subjects who survived under both sets of treatment regimens - a special case of principal stratification (Frangakis and Rubin, 2002). Because the counterfactual survival status would be unobserved, we imputed their survival status and restrict analysis to subjects who were observed and predicted to survive under both treatment regimes. We used a modified version of penalized spline of propensity methods in treatment comparisons (PENCOMP, Zhou et. al, 2018) to obtain a robust imputation of the counterfactual cognitive outcomes. Finally, we consider several possible extensions of these efforts for future work.
dc.language.isoen_US
dc.subjectBayesian additive regression trees
dc.subjectPrediction
dc.subjectImputation
dc.subjectCausal inference
dc.titleNovel Applications and Extensions for Bayesian Additive Regression Trees (BART) in Prediction, Imputation, and Causal Inference
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineBiostatistics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberElliott, Michael R
dc.contributor.committeememberShedden, Kerby A
dc.contributor.committeememberFlannagan, Carol Ann
dc.contributor.committeememberKang, Jian
dc.contributor.committeememberSanchez, Brisa N
dc.subject.hlbsecondlevelStatistics and Numeric Data
dc.subject.hlbtoplevelScience
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/147594/1/vincetan_1.pdf
dc.identifier.orcid0000-0001-5950-9846
dc.identifier.name-orcidTan, Yaoyuan Vincent; 0000-0001-5950-9846en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.