Show simple item record

Causal Inference Methods for Comparing Multiple Treatments using Data from Large Insurance Claims Databases

dc.contributor.authorYu, Youfei
dc.date.accessioned2022-05-25T15:26:36Z
dc.date.available2022-05-25T15:26:36Z
dc.date.issued2022
dc.date.submitted2022
dc.identifier.urihttps://hdl.handle.net/2027.42/172688
dc.description.abstractLarge healthcare databases used primarily for billing and payments, such as electronic health records and insurance claims data, have been increasingly used to conduct comparative effectiveness research that characterize multiple treatment/intervention strategies for a particular clinical condition. Estimation of average treatment effects (ATE) using such observational data is prone to bias due to confounders related to both treatment and outcome. Another potential source of bias is right-censoring, which occurs when patients drop out of the system or the reporting period ends before the occurrence of the event of interest. Our study is motivated by analysis embedded within the OptumInsight Clinformatics Data Mart, a private health insurance claims database. Interest is in assessing the adverse effects of four common therapies for metastatic castration-resistant prostate cancer, with the outcome being hospitalization and/or admission to the emergency room within a short time window of treatment initiation. In Chapter I, we consider observational data with two or more treatments. Propensity score methods are routinely used to correct for confounding biases. A large fraction of those methods in the current literature consider the case of either two treatments or continuous outcome. There has been extensive literature with multiple treatment or binary outcome, but interest often lies in the intersection, for which the literature is still evolving. The contribution of this Chapter is to focus on this intersection and compare across existing methods, some of which are fairly recent. We assess the relative performance of these methods through a set of simulation studies and provide recommendations for the practitioners. In Chapter II, we propose a method that directly models the binary outcome using logistic regression, with confounding and censoring properly accounted for by weighting. We call the method inverse probability weighted regression-based estimator that accounts for censoring, or CIPWR. CIPWR estimates the ATE by averaging the predicted outcomes obtained from a logistic regression model that is fitted using a weighted score function. The CIPWR estimator has a double robustness property such that estimation consistency can be achieved when either the model for the outcome or the models for both treatment and censoring are correctly specified. We establish the asymptotic properties of the CIPWR estimator for conducting inference, and compare its finite sample performance with that of several alternatives through simulation studies. The methods under comparison are applied to the cohort of prostate cancer patients from the insurance claims database. In Chapter III, we consider a setting where a massive collection of candidate covariates potentially related to both treatment and outcome are available. In addition, the treatment generating model possibly involves nonlinearity and/or nonadditivity. In this setting, a key challenge is to identify variables to be included in the propensity score model from a high-dimensional set of measured covariates to remove the bias. We examine an ensemble of data-driven methods that select the variables for inclusion in the treatment model, including regularized regression and modern machine learning tools. We allow the outcome-covariate associations to contribute to the variable selection process, and show through simulation studies that leveraging the information of the outcome-covariate relationship when modeling the propensity scores can improve statistical efficiency and robustness against model misspecification of propensity score-based methods, such as inverse probability weighting. The improvement of precision in the estimates of treatment effects is also observed in our application to the prostate cancer data.
dc.language.isoen_US
dc.subjectcausal inference
dc.subjectobservational studies
dc.subjectcomparative effectiveness research
dc.subjectclaims data
dc.subjectmultiple treatment
dc.subjectright-censoring
dc.titleCausal Inference Methods for Comparing Multiple Treatments using Data from Large Insurance Claims Databases
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineBiostatistics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberMukherjee, Bhramar
dc.contributor.committeememberZhang, Min
dc.contributor.committeememberRyan, Andrew Michael
dc.contributor.committeememberWu, Zhenke
dc.subject.hlbsecondlevelPublic Health
dc.subject.hlbtoplevelHealth Sciences
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/172688/1/youfeiyu_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/4717
dc.identifier.orcid0000-0002-4986-848X
dc.identifier.name-orcidYu, Youfei; 0000-0002-4986-848Xen_US
dc.working.doi10.7302/4717en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.