Show simple item record

Synthetic Data for Small Area Estimation.

dc.contributor.authorSakshaug, Joseph Walteren_US
dc.date.accessioned2012-01-26T19:59:29Z
dc.date.availableNO_RESTRICTIONen_US
dc.date.available2012-01-26T19:59:29Z
dc.date.issued2011en_US
dc.date.submitteden_US
dc.identifier.urihttps://hdl.handle.net/2027.42/89610
dc.description.abstractSmall area estimates provide a critical source of information used by a variety of stakeholders to study human conditions and behavior at the local level. Statistical agencies regularly collect survey microdata from small geographic areas but are prevented from identifying these areas in public-use microdata sets due to disclosure concerns. Alternative data dissemination methods include releasing summary tables for small areas and accessing restricted identifiers via Research Data Centers. This dissertation proposes a new method of disseminating public-use microdata that contains more geographical details than are currently being released. The basic idea is to replace the observed survey values with imputed, or synthetic, values. Data confidentiality is enhanced because no actual values are released. This dissertation proposes three statistical methods for generating synthetic data for small geographic areas. The first method utilizes a fully-parametric hierarchical Bayesian model that is used to generate synthetic microdata from the posterior predictive distribution. The second method consists of a nonparametric procedure for generating synthetic data for continuous non-normal distributions. The third method accounts for complex sample design features and permits the generation of synthetic data for both sampled and nonsampled small areas. These three methods are demonstrated and evaluated using a mix of public-use and restricted microdata from the American Community Survey and National Health Interview Survey. Each of the methods is evaluated using empirical, simulation, and cross-validation studies. The analytic validity of the methods is assessed by comparing the small area estimates obtained from the synthetic data with those obtained from the observed data.en_US
dc.language.isoen_USen_US
dc.subjectStatistical Disclosure Control, Synthetic Data, Small Area Estimation, Hierarchical Model, Sequential Regression Multiple Imputationen_US
dc.titleSynthetic Data for Small Area Estimation.en_US
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineSurvey Methodologyen_US
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studiesen_US
dc.contributor.committeememberRaghunathan, Trivellore E.en_US
dc.contributor.committeememberLepkowski, James M.en_US
dc.contributor.committeememberLittle, Roderick J.en_US
dc.contributor.committeememberValliant, Richard L.en_US
dc.subject.hlbsecondlevelStatistics and Numeric Dataen_US
dc.subject.hlbtoplevelSocial Sciencesen_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/89610/1/joesaks_1.pdf
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.