Show simple item record

Multiple Imputation Methods for Statistical Disclosure Control.

dc.contributor.authorAn, Dien_US
dc.date.accessioned2008-05-08T18:59:38Z
dc.date.availableNO_RESTRICTIONen_US
dc.date.available2008-05-08T18:59:38Z
dc.date.issued2008en_US
dc.date.submitteden_US
dc.identifier.urihttps://hdl.handle.net/2027.42/58398
dc.description.abstractStatistical disclosure control (SDC) is an important consideration in the release of public use data sets. Statistical agencies seek SDC methods that limit risk of identification of respondents while preserving original information in the data. This dissertation concerns disclosure risk caused by extreme values of variables such as income or age. Top-coding is a simple SDC procedure for this situation, but it limits analysis for the data user and may result in distorted inference. We propose two alternative methods to top-coding for SDC, a non-parametric hot-deck procedure and a parametric Bayesian method. Both methods are based on multiple imputation (MI). In the first part of the dissertation we describe our SDC methods and illustrate the performance of these methods for inference about the mean of a variable subject to SDC, by simulations and application to data from the Chinese income project. We compare estimates from our methods with those calculated from the original data, and from the top-coding method. Results show that our MI methods provide better inferences of the publicly-released data than top-coding. In the second part of the dissertation, we study impact of SDC methods on linear regression where the outcome is subject to top-coding, and extend previous methods to condition on the observed covariates. We propose stratified and regression-based extensions of our MI methods and show in simulation studies that these methods yield estimates of regression coefficients close to those obtained before deletion. In the third part of the dissertation, we consider a specific application concerning disclosure risk caused by some participants attaining high ages because of prolonged participation in a longitudinal study, and develop nonparametric, stratified MI methods. We apply these methods in survival analysis using Cox’s proportional hazard model. Simulation studies prove these methods work well in preserving the relationship between hazard and covariates. We illustrate the methods on data from Charleston Heart Study.en_US
dc.format.extent1431398 bytes
dc.format.extent1373 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypetext/plain
dc.language.isoen_USen_US
dc.subjectConfidentialityen_US
dc.subjectDisclosure Protectionen_US
dc.subjectMultiple Imputationen_US
dc.subjectLongitudinal Dataen_US
dc.subjectSurvival Analysisen_US
dc.titleMultiple Imputation Methods for Statistical Disclosure Control.en_US
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineBiostatisticsen_US
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studiesen_US
dc.contributor.committeememberLittle, Roderick J.en_US
dc.contributor.committeememberElliott, Michael R.en_US
dc.contributor.committeememberGutmann, Myron P.en_US
dc.contributor.committeememberRaghunathan, Trivellore E.en_US
dc.subject.hlbtoplevelScienceen_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/58398/1/dianch_1.pdf
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.