Design-Based Methods for the Analysis of Modern Randomized Experiments

Wu, James

Design-Based Methods for the Analysis of Modern Randomized Experiments

Wu, James

2021

View/Open

jameswu_1.pdf

(1.1MB

PDF)

Abstract

Randomized experiments are increasingly prevalent across a variety of fields, particularly in the social sciences and medicine. This is due in part to their reputation as the "gold standard" for establishing causal relationships. The proliferation of randomized experiments has resulted in a variety of challenges in a time where large data sets are becoming more common. For some experiments, a large number of pretreatment covariates are available for each participant. It is common to make adjustments for small imbalances in these baseline covariates when analyzing the results of a randomized experiment. Traditional covariate adjustment methods such as linear regression can perform poorly or fail entirely when the number of covariates is large. This can be solved by first performing model selection, which may lead to concerns about data snooping and the validity of post-selection inferences. Several authors have suggested specifying the statistical analysis in advance to address this issue. However, it may not be clear ahead of time which covariates to use for making adjustments, or if covariate adjustment will even be helpful. To address this concern, we propose a flexible covariate adjustment method, the LOOP ("Leave-One-Out Potential outcomes") estimator. This method allows for automatic variable selection, so that we do not need to know ahead of time which variables to use. In addition, the method is unbiased under the Neyman-Rubin model and generally performs at least as well as the unadjusted estimator. This alleviates concerns that the adjustment could harm the performance of the treatment effect estimate. Covariate imbalance can also be addressed using study design. In paired experiments, participants are grouped into pairs with similar characteristics, and one observation from each pair is randomly assigned to treatment. While this study design is often successful in balancing the treatment and control groups, it may still be possible to improve precision using covariate adjustment. We build on the LOOP estimator and propose a design-based covariate adjustment method for paired experiments. This method addresses a unique trade-off that exists for paired experiments, where it can be unclear the extent to which account for the paired structure. By addressing this trade-off, the method has the potential to improve over existing methods. Modern randomized experiments may be accompanied by a large amount of auxiliary data, such as related observational data. Sample sizes of randomized experiments are often limited due to practical constraints. However, sample sizes for the auxiliary data can be large. We propose a covariate adjustment method that allows us to use observational data sets to make adjustments to the experimental data without bias from confounding variables leaking into our analysis. Our method also adjusts for the covariates within the randomized experiment itself, and automatically interpolates between the adjustment made using the experimental covariates and the observational data set. Finally, we propose a method for high-dimensional classification. In this method, we have the predictors in a data set compete in a "tournament" until they have been combined into single predictor. From a computation perspective, this method is a natural fit to be used within the LOOP estimator when the outcome is binary; however, it can also be used more generally. The method shares several of the features used within the covariate adjustment methods, such as the use of a leave-one-out procedure to improve performance and interpolation between competing predictors.

Deep Blue DOI

https://dx.doi.org/10.7302/2914

Subjects

Causal Inference

Randomized Experiments

Covariate Adjustment

High-Dimensional Classification

Types

Thesis

Handle

https://hdl.handle.net/2027.42/169869

Metadata

Show full item record

Collections

Dissertations and Theses (Ph.D. and Master's)

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.