Statistical Challenges in Combining Information from Big and Small Data Sources
dc.contributor.author | Raghunathan, Trivellore | |
dc.date.accessioned | 2016-05-31T13:18:27Z | |
dc.date.available | 2016-05-31T13:18:27Z | |
dc.date.issued | 2015-11-19 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/120417 | |
dc.description.abstract | Social Media, electronic health records, credit card transactional and administrative data, web scraping, and numerous other ways of collecting information have changed the landscape for those interested in addressing policy-relevant research questions. During the same time, the traditional sources of data, such as large-scale surveys, that have been a stable source for policy-relevant research have su ered set- backs due to large nonresponse and increasing data collection costs. The non-survey data usually contain detailed information on certain behaviors on a large number of individuals (such as all credit card transactions) but very little background information on them (such as important covariates to address the policy-relevant question). On the other hand, the survey data contains detailed information on co- variates but not so detailed information on the behaviors. Both data sources may not be perfect for the target population of interest. This paper develops and evaluates a framework for linking information from multiple imperfect data sources along with the Census data to draw statistical inference. An explicit modeling framework involving se- lection into the big data, sampling and nonresponse mechanism in the survey data, distribution of the key variables of interest and cer- tain marginal distributions from the Census Data are used as building blocks to draw inference about the population quantity of interest. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | Bayesian Analysis, Combining Information, Multiple Imputation, Missing data, Sample Surveys | en_US |
dc.title | Statistical Challenges in Combining Information from Big and Small Data Sources | en_US |
dc.type | Article | en_US |
dc.subject.hlbsecondlevel | Social Sciences (General) | |
dc.subject.hlbtoplevel | Social Sciences | |
dc.contributor.affiliationumcampus | Ann Arbor | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/120417/1/NAS-Paper.pdf | |
dc.description.filedescription | Description of NAS-Paper.pdf : Main Article | |
dc.owningcollname | Survey Research Center (ISR) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.