JavaScript is disabled for your browser. Some features of this site may not work without it.

Using LASSO to Calibrate Non-probability Samples using Probability Samples.

Chen, Kuang-Tsung

Chen, Kuang-Tsung

2016

Abstract: Amidst declining response rates and rapidly increasing costs of probability-based sampling, the resurgence of more cost-effective non-probability sampling has prompted survey researchers to explore different adjustment methods for non-probability samples. The current approach attempts to create one single set of survey weights to correct all imbalances within a non-probability sample. One scheme is to generate estimated selection weights by combining the non-probability sample with a large probability-sampling-based dataset with all variables related to propensity of a respondent being in the non-probability sample. In practice, obtaining an appropriate probability sample is costly, and usually there is no way to determine the correct probability of selection for the non-probability sample, or even if all variables are available in the non-probability data to do so. An alternative approach is to adjust the non-probability sample so that the weighted sample totals of a set of variables, known as calibration variables, equal to their Census benchmark totals. Although the method does not require specialized probability-sampling-based data, the resulting calibrated weights can only correct the imbalance with respect to the limited number of Census benchmark variables, which is insufficient for adjusting all errors of a non-probability sample. To date, no method has shown to be effective in helping researchers make unbiased inference from non-probability samples.
This dissertation addresses the growing demand for making proper inference from non-probability samples. Instead of generating a single set of weights to fix all errors in a non-probability sample, we focus on constructing weights to enable unbiased inference for a specific outcome of interest. We introduce the Least Angle Shrinkage and Selection Operator, LASSO, to the framework of model-assisted calibration. The proposed method, LASSO calibration, determines the set of variables with the strongest relation to the outcome variable, then calibrates to expected outcome in a probability benchmark sample. The estimator of population total based on LASSO calibrated weights can be unbiased, regardless of how samples are generated. The theoretical framework is developed and evaluated through simulations. An application of LASSO calibration to a large-scale internet-based non-probability sample shows the proposed method can make more accurate and precise inference than existing methods.