Show simple item record

Statistical Inference for Diverging Number of Parameters beyond Linear Regression

dc.contributor.authorXia, Lu
dc.date.accessioned2020-10-04T23:23:20Z
dc.date.availableNO_RESTRICTION
dc.date.available2020-10-04T23:23:20Z
dc.date.issued2020
dc.identifier.urihttps://hdl.handle.net/2027.42/162934
dc.description.abstractIn the big data era, regression models with a large number of covariates have emerged as a common tool to tackle problems arising from business, engineering, genomics, neuroimaging, and epidemiological studies. Drawing statistical inference for these models has sparked much interest over the past few years. Albeit successful for high dimensional linear models, high dimensional inference approaches beyond linear regression are limited and present unsatisfactory performance, theoretically or numerically. In this dissertation, we focus on de-biased lasso, which has been one of the most popular methods for high dimensional inferences. We propose procedures that provide better bias correction and confidence interval coverage, and draw reliable inference for regression parameters in the "large n, diverging p" scenario. In general, we caution against applying de-biased lasso and its variants to models beyond linear regression when parameters outnumber the sample size. Following an overview outlined in Chapter I, we focus on the generalized linear models (GLMs) in Chapter II. Extensive numerical simulations indicate that de-biased lasso may not adequately remove biases for high dimensional GLMs, and thus yield unreliable confidence intervals. We have further found that several key assumptions, especially the sparsity condition on the inverse Hessian matrix, may not hold for GLMs. In a "large n, diverging p" scenario, we consider an alternative de-biased lasso approach that inverts the Hessian matrix of the concerned model without requiring matrix sparsity, and establish the asymptotic distributions of linear combinations of the estimates. Simulations evidence that our proposed de-biased estimator performs better in bias correction and confidence interval coverage for a wide range of p/n ratios. We apply our method to the Boston Lung Cancer Study, an epidemiology study on the mechanisms underlying lung cancer, and investigate the joint effects of genetic variants on overall lung cancer risks. In Chapter III, we draw inference based on the Cox proportional hazards model with a diverging number of covariates. As the existing methods assume sparsity on the inverse of the Fisher information matrix, which may not hold for Cox models, they typically generate biased estimates and under-covered confidence intervals. We modify de-biased lasso by using quadratic programming to approximate the inverse of the information matrix, without posing matrix sparsity assumptions. We establish the asymptotic theory for the estimated regression coefficients when the covariate dimension diverges with the sample size. With extensive simulations, our proposed method provides consistent estimates and confidence intervals with improved coverage probabilities. We apply the proposed method to assess the effects of genetic markers on overall survival of non-small cell lung cancer patients in the aforementioned Boston Lung Cancer Study. Stratified Cox proportional hazards model, with extensive applications in large scale cohort studies, are useful when some covariates violate the proportional hazards assumption or data are stratified based on factors, such as transplant centers. In Chapter IV, we extend the de-biased lasso approach proposed in Chapter III to draw inference for the stratified Cox model with potentially many covariates. We provide asymptotic results useful for inference on linear combinations of the regression parameters, and demonstrate its utility via simulation studies. We apply the method to analyze the national kidney transplantation data stratified by transplant center, and assess the effects of many factors on graft survival.
dc.language.isoen_US
dc.subjectDe-biased lasso
dc.subjectDiverging dimension
dc.subjectStatistical inference
dc.subjectConfidence intervals
dc.subjectGeneralized linear models
dc.subjectCox proportional hazards model
dc.titleStatistical Inference for Diverging Number of Parameters beyond Linear Regression
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineBiostatistics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberLi, Yi
dc.contributor.committeememberNan, Bin
dc.contributor.committeememberBanerjee, Moulinath
dc.contributor.committeememberMukherjee, Bhramar
dc.subject.hlbsecondlevelStatistics and Numeric Data
dc.subject.hlbtoplevelScience
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/162934/1/luxia_1.pdfen_US
dc.identifier.orcid0000-0003-1561-6871
dc.identifier.name-orcidXia, Lu; 0000-0003-1561-6871en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.