Robust and Efficient Bayesian Inference for Large-scale Non-probability Samples

Rafei, Ali

Robust and Efficient Bayesian Inference for Large-scale Non-probability Samples

dc.contributor.author	Rafei, Ali
dc.date.accessioned	2021-09-24T19:07:26Z
dc.date.available	2021-09-24T19:07:26Z
dc.date.issued	2021
dc.identifier.uri	https://hdl.handle.net/2027.42/169715
dc.description.abstract	The steady decline of response rates in probability surveys, in parallel with the fast emergence of large-scale unstructured data (“Big Data”), has led to a growing interest in the use of such data for finite population inference. However, the non-probabilistic nature of their data-generating process makes big-data-based findings prone to selection bias. When the sample is unbalanced with respect to the population composition, the larger data volume amplifies the relative contribution of selection bias to total error. Existing robust approaches assume that the models governing the population structure or selection mechanism have been correctly specified. Such methods are not well-developed for outcomes that are not normally distributed and may perform poorly when there is evidence of outlying weights. In addition, their variance estimator often lacks a unified framework and relies on asymptotic theory that might not have good small-sample performance. This dissertation proposes novel Bayesian approaches for finite population inference based on a non-probability sample where a parallel probability sample is available as the external benchmark. Bayesian inference satisfies the likelihood principle and provides a unified framework for quantifying the uncertainty of the adjusted estimates by simulating the posterior predictive distribution of the unknown parameter of interest in the population. The main objective of this thesis is to draw robust inference by weakening the modeling assumptions because the true structure of the underlying models is always unknown to the analyst. This is achieved through either combining different classes of adjustment methods, i.e. quasi-randomization and prediction modeling, or using flexible non-parametric models including Bayesian Additive Regression Trees (BART) and Gaussian Process (GP) Regression. More specifically, I modify the idea of augmented inverse propensity weighting such that BART can be used for predicting both propensity scores and outcome variables. This offers additional shields against model misspecification beyond the double robustness. To eliminate the need for design-based estimators, I take one further step and develop a fully model-based approach where the outcome is imputed for all non-sampled units of the population via a partially linear GP regression model. It is demonstrated that GP behaves as an optimal kernel matching tool based on the estimated propensity scores. To retain double robustness with good repeated sampling properties, I estimate the outcome and propensity scores jointly under a unified Bayesian framework. Further developments are suggested for situations where the reference sample is complex in design, and particular attention is paid to the computational scalability of the proposed methods where the population or the non-probability sample is large in size. Throughout the thesis, I assess the repeated sampling properties of the proposed methods in simulation studies and apply them to real-world non-probability sampling inference.
dc.language.iso	en_US
dc.subject	doubly robust
dc.subject	pseudo-weighting
dc.subject	prediction modeling
dc.subject	Bayesian Additive Regression Trees
dc.subject	Gaussian Process Regression
dc.title	Robust and Efficient Bayesian Inference for Large-scale Non-probability Samples
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Survey and Data Science
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Elliott, Michael R
dc.contributor.committeemember	Boonstra, Phil
dc.contributor.committeemember	Flannagan, Carol Ann
dc.contributor.committeemember	Little, Roderick J
dc.contributor.committeemember	West, Brady Thomas
dc.subject.hlbsecondlevel	Statistics and Numeric Data
dc.subject.hlbtoplevel	Science
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/169715/1/arafei_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/2760
dc.identifier.orcid	0000-0002-1436-5671
dc.identifier.name-orcid	Rafei, Ali; 0000-0002-1436-5671	en_US
dc.working.doi	10.7302/2760	en
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: arafei_1.pdf
Size:: 2.297MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.