Statistical Inference for Diverging Number of Parameters beyond Linear Regression

Xia, Lu

Statistical Inference for Diverging Number of Parameters beyond Linear Regression

Xia, Lu

2020

View/Open

luxia_1.pdf

(19.4MB

PDF)

Abstract

In the big data era, regression models with a large number of covariates have emerged as a common tool to tackle problems arising from business, engineering, genomics, neuroimaging, and epidemiological studies. Drawing statistical inference for these models has sparked much interest over the past few years. Albeit successful for high dimensional linear models, high dimensional inference approaches beyond linear regression are limited and present unsatisfactory performance, theoretically or numerically. In this dissertation, we focus on de-biased lasso, which has been one of the most popular methods for high dimensional inferences. We propose procedures that provide better bias correction and confidence interval coverage, and draw reliable inference for regression parameters in the "large n, diverging p" scenario. In general, we caution against applying de-biased lasso and its variants to models beyond linear regression when parameters outnumber the sample size. Following an overview outlined in Chapter I, we focus on the generalized linear models (GLMs) in Chapter II. Extensive numerical simulations indicate that de-biased lasso may not adequately remove biases for high dimensional GLMs, and thus yield unreliable confidence intervals. We have further found that several key assumptions, especially the sparsity condition on the inverse Hessian matrix, may not hold for GLMs. In a "large n, diverging p" scenario, we consider an alternative de-biased lasso approach that inverts the Hessian matrix of the concerned model without requiring matrix sparsity, and establish the asymptotic distributions of linear combinations of the estimates. Simulations evidence that our proposed de-biased estimator performs better in bias correction and confidence interval coverage for a wide range of p/n ratios. We apply our method to the Boston Lung Cancer Study, an epidemiology study on the mechanisms underlying lung cancer, and investigate the joint effects of genetic variants on overall lung cancer risks. In Chapter III, we draw inference based on the Cox proportional hazards model with a diverging number of covariates. As the existing methods assume sparsity on the inverse of the Fisher information matrix, which may not hold for Cox models, they typically generate biased estimates and under-covered confidence intervals. We modify de-biased lasso by using quadratic programming to approximate the inverse of the information matrix, without posing matrix sparsity assumptions. We establish the asymptotic theory for the estimated regression coefficients when the covariate dimension diverges with the sample size. With extensive simulations, our proposed method provides consistent estimates and confidence intervals with improved coverage probabilities. We apply the proposed method to assess the effects of genetic markers on overall survival of non-small cell lung cancer patients in the aforementioned Boston Lung Cancer Study. Stratified Cox proportional hazards model, with extensive applications in large scale cohort studies, are useful when some covariates violate the proportional hazards assumption or data are stratified based on factors, such as transplant centers. In Chapter IV, we extend the de-biased lasso approach proposed in Chapter III to draw inference for the stratified Cox model with potentially many covariates. We provide asymptotic results useful for inference on linear combinations of the regression parameters, and demonstrate its utility via simulation studies. We apply the method to analyze the national kidney transplantation data stratified by transplant center, and assess the effects of many factors on graft survival.

Subjects

De-biased lasso

Diverging dimension

Statistical inference

Confidence intervals

Generalized linear models

Cox proportional hazards model

Types

Thesis

Handle

https://hdl.handle.net/2027.42/162934

Metadata

Show full item record

Collections

Dissertations and Theses (Ph.D. and Master's)

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.