Show simple item record

Shrinkage Methods Utilizing Auxiliary Information to Improve High-Dimensional Prediction Models.

dc.contributor.authorBoonstra, Philip S.en_US
dc.date.accessioned2013-02-04T18:04:25Z
dc.date.availableNO_RESTRICTIONen_US
dc.date.available2013-02-04T18:04:25Z
dc.date.issued2012en_US
dc.date.submitted2012en_US
dc.identifier.urihttps://hdl.handle.net/2027.42/95993
dc.description.abstractWith advancements in genomic technologies, it is common to have two high-dimensional datasets, each measuring one underlying biological phenomenon with different techniques. We consider predicting a continuous outcome Y using X, a set of p markers that best measure the underlying biological process. This same process is also measured by W, coming from prior technology but correlated with X. We have (Y,X,W) on a moderately-sized sample and (Y,W) on a larger sample. We utilize the data on W to boost prediction of Y by X when p is large. Our work is motivated by a dataset containing gene-expression measurements from both quantitative real-time polymerase chain reaction and microarray technologies. First, we propose a class of targeted ridge (TR) estimators that shrink the regression coefficients of Y on X toward targets derived using the larger dataset and give two specific TR estimators. A hybrid estimator combines multiple TR estimators, data-adaptively balancing efficiency and robustness. Next, we view the problem from a Bayesian perspective. Hyperparameters control the shrinkage of the model parameters, giving flexibility in terms of what to shrink and to what extent. All unknown quantities – the missing X’s from the larger sample, the model parameters, and the shrinkage parameters – are iteratively sampled. Alternatively, we show how Empirical Bayes methods which maximize marginal likelihoods can estimate the shrinkage parameters. Finally, we consider estimating the tuning parameter of a ridge regression, particularly when the sample size is small relative to the number of predictors. A proposed corrected generalized cross-validation criterion is not subject to overfitting but remains asymptotically optimal. We also define a hyperpenalty that shrinks the tuning parameter itself, protecting against over- or underfitting. Maximizing the "hyperpenalized" likelihood can yield smaller prediction error than many common alternatives. Embedding the hyperpenalty into the penalized EM algorithm yields a hyperpenalized EM algorithm, which may be applied to the original missing data prediction problem. All of the approaches are compared via simulation studies and applied to the motivating gene-expression dataset. This dissertation therefore contributes to the literature on missing data and measurement error methods as they relate to prediction in high-dimensional models.en_US
dc.language.isoen_USen_US
dc.subjectGenomicsen_US
dc.subjectMissing Dataen_US
dc.subjectMeasurement Erroren_US
dc.subjectRidge Regressionen_US
dc.subjectPenalized Likelihooden_US
dc.subjectEM Algorithmen_US
dc.titleShrinkage Methods Utilizing Auxiliary Information to Improve High-Dimensional Prediction Models.en_US
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineBiostatisticsen_US
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studiesen_US
dc.contributor.committeememberMukherjee, Bhramaren_US
dc.contributor.committeememberTaylor, Jeremy M.en_US
dc.contributor.committeememberNguyen, Longen_US
dc.contributor.committeememberRaghunathan, Trivellore E.en_US
dc.subject.hlbsecondlevelStatistics and Numeric Dataen_US
dc.subject.hlbtoplevelScienceen_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/95993/1/philb_1.pdf
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.