Show simple item record

L0 Constraint Optimization, Homogeneity Fusion, and Mediation Analyses

dc.contributor.authorWang, Wen
dc.date.accessioned2020-10-04T23:20:52Z
dc.date.availableNO_RESTRICTION
dc.date.available2020-10-04T23:20:52Z
dc.date.issued2020
dc.date.submitted2020
dc.identifier.urihttps://hdl.handle.net/2027.42/162877
dc.description.abstractThe focus of this dissertation is to develop a framework of L0 regularized statistical procedures to identify subgroups among regression coefficients and estimation of subgroup-specific parameters. The proposed constrained discrete optimization methodology estimates group labels by solving mixed integer programming problems (MIP) via efficient algorithms. I develop key large-sample theories for the proposed methods, including subgroup selection consistency and estimation consistency using some new non-asymptotic bounds. Also, the R statistical software is made available to the public for the proposed methods. In the first project presented in Chapter II, I consider a high-dimensional regression setting. The primary objective is to develop a dimension reduction method that can identify homogeneous subgroups among regression coefficients and sparse feature selection simultaneously. The resulting estimates of regression coefficients in each subgroup share the same value. To encourage sparsity, a large subgroup of coefficients is allowed to be estimated exactly as zero. To achieve this objective, I propose a new L0 constrained optimization method, which is formulated as a MIP problem. To implement this MIP method, I develop a novel algorithm with warm start via both a discrete first-order method and segment neighborhood method, and establish its convergence properties. This new approach is able to solve the MIP problem with satisfactory accuracy in short time. To attain global optimality of the MIP method, I reformulate the constrained optimality as another MIP problem that can then be solved efficiently by Kelley's cutting plane method. A sufficient condition for consistent estimation of group labels is affirmed, which is proved to be the necessary condition under which any method attains consistency of subgroup clustering up to a constant factor. Surprisingly, to achieve the clustering consistency, the sample size only needs to grow at the same rate as the sum of logarithm of the number of regression coefficients and the logarithm of the true number of subgroups. A real data analysis is used to illustrate the performance of the proposed method and algorithms. In the second project presented in Chapter III, I consider a structural equation model, and aim to estimate model parameters in causal mediation pathways in the presence of high-dimensional potential mediators. I develop statistical procedures to select sparse important mediators and to identify sparse causal pathways simultaneously. To address the technical challenge, I propose a new L0 constrained optimization method, which leads to an MIP formulation. To solve this MIP problem, I develop a new warm start algorithm by using the discrete first-order method and establish its convergence properties. This new algorithm is able to quickly attain a near-optimal solution. To achieve the global optimality of the MIP problem, I reformulate it, so that I can solve this MIP problem efficiently using Kelley's cutting plane method. I present a sufficient condition for the proposed method for the selection consistency of causal pathways, which is proved as the necessary condition under which any method can achieve the causal pathway selection consistency up to a constant factor. Simulation studies and real-world data analyses are used to demonstrate the performance of the proposed method and algorithms.
dc.language.isoen_US
dc.subjectconstrained maximum likelihood
dc.subjecthomogeneity fusion
dc.subjectexploratory mediation analysis
dc.subjectnon-convex optimization
dc.subjectL0 constraint
dc.titleL0 Constraint Optimization, Homogeneity Fusion, and Mediation Analyses
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineBiostatistics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberSong, Peter Xuekun
dc.contributor.committeememberXu, Gongjun
dc.contributor.committeememberKalbfleisch, John D
dc.contributor.committeememberZhu, Ziwei
dc.subject.hlbsecondlevelStatistics and Numeric Data
dc.subject.hlbtoplevelScience
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/162877/1/wangwen_1.pdfen_US
dc.identifier.orcid0000-0001-9509-7383
dc.identifier.name-orcidWang, Wen; 0000-0001-9509-7383en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.