The statistical analysis of index variables containing missing data.
Johanns, Jewel Rose
1999
Abstract
Consider a data set with several polytomous variables that measure the same underlying trait. Assume that some of these variables contain missing data. Suppose interest is in the regression of a response on the sum, or index, of these variables. We cannot analyze the data directly because the index contains missing values. Simple methods of handling the missing data problem include complete-case analysis, which uses only cases with complete data, and conditional mean imputation, which fills in the missing data using best linear predictions. These methods can lead to biased estimates of regression coefficients and underestimation of the standard error. I addressed this missing data problem using item response theory (IRT) from educational testing literature, which models the probability of a subject answering a test item correctly given the subject's latent ability. A particular model, the partial credit model (PCM), was used to model the ordinal rating-scale data. The partial credit model gives the probability of the categories for each rating variable, given the latent trait. The PCM contains a separate difficulty parameter for each variable and common threshold parameters over all variables. The threshold parameters separate response categories on the latent continuum. The response in the regression model and the latent trait variable were assumed to have a bivariate normal distribution. The marginal distribution of the latent trait variable was assumed to be standard normal. The PCM was used to develop a multiple imputation procedure for addressing the missing data problem. Multiple imputation is a method in which several draws for each missing value are obtained from the predictive distribution of the missing values. Estimates of the regression coefficient from each of the filled-in datasets are then combined in such a way to obtain a consistent estimate of the regression coefficient and propagate the imputation error and improve the precision of the resulting estimate. A maximum likelihood method using an ECM algorithm with the PCM was first developed. A Bayesian method, using a Gibbs' sampling algorithm that incorporated Griddy Gibbs' sampling and rejection sampling was then developed. Multiple imputation was utilized for the Bayesian method. A simulation study was used to compare the two model-based methods with each other and with existing methods for this type of missing data problem. The results of the study indicate that the multiple imputation Gibbs' sampling algorithm based on the PCM was superior to the other methods in terms of bias, RMSE, and percent coverage of confidence intervals. Conditional mean imputation was nearly as good, however regression coefficients were biased for high correlations and the uncertainty in the imputation process is not accounted for with conditional mean imputation.Subjects
Analysis Containing Gibbs Sampler Index Variables Missing Data Partial Credit Method Statistical
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.