L0 Constraint Optimization, Homogeneity Fusion, and Mediation Analyses
Wang, Wen
2020
Abstract
The focus of this dissertation is to develop a framework of L0 regularized statistical procedures to identify subgroups among regression coefficients and estimation of subgroup-specific parameters. The proposed constrained discrete optimization methodology estimates group labels by solving mixed integer programming problems (MIP) via efficient algorithms. I develop key large-sample theories for the proposed methods, including subgroup selection consistency and estimation consistency using some new non-asymptotic bounds. Also, the R statistical software is made available to the public for the proposed methods. In the first project presented in Chapter II, I consider a high-dimensional regression setting. The primary objective is to develop a dimension reduction method that can identify homogeneous subgroups among regression coefficients and sparse feature selection simultaneously. The resulting estimates of regression coefficients in each subgroup share the same value. To encourage sparsity, a large subgroup of coefficients is allowed to be estimated exactly as zero. To achieve this objective, I propose a new L0 constrained optimization method, which is formulated as a MIP problem. To implement this MIP method, I develop a novel algorithm with warm start via both a discrete first-order method and segment neighborhood method, and establish its convergence properties. This new approach is able to solve the MIP problem with satisfactory accuracy in short time. To attain global optimality of the MIP method, I reformulate the constrained optimality as another MIP problem that can then be solved efficiently by Kelley's cutting plane method. A sufficient condition for consistent estimation of group labels is affirmed, which is proved to be the necessary condition under which any method attains consistency of subgroup clustering up to a constant factor. Surprisingly, to achieve the clustering consistency, the sample size only needs to grow at the same rate as the sum of logarithm of the number of regression coefficients and the logarithm of the true number of subgroups. A real data analysis is used to illustrate the performance of the proposed method and algorithms. In the second project presented in Chapter III, I consider a structural equation model, and aim to estimate model parameters in causal mediation pathways in the presence of high-dimensional potential mediators. I develop statistical procedures to select sparse important mediators and to identify sparse causal pathways simultaneously. To address the technical challenge, I propose a new L0 constrained optimization method, which leads to an MIP formulation. To solve this MIP problem, I develop a new warm start algorithm by using the discrete first-order method and establish its convergence properties. This new algorithm is able to quickly attain a near-optimal solution. To achieve the global optimality of the MIP problem, I reformulate it, so that I can solve this MIP problem efficiently using Kelley's cutting plane method. I present a sufficient condition for the proposed method for the selection consistency of causal pathways, which is proved as the necessary condition under which any method can achieve the causal pathway selection consistency up to a constant factor. Simulation studies and real-world data analyses are used to demonstrate the performance of the proposed method and algorithms.Subjects
constrained maximum likelihood homogeneity fusion exploratory mediation analysis non-convex optimization L0 constraint
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.