Some Advances on Modeling High-Dimensional Data with Complex Structures

Qian, Cheng

Some Advances on Modeling High-Dimensional Data with Complex Structures

Qian, Cheng

2017

View/Open

qianche_1.pdf

(1.7MB

PDF)

Abstract

Recent advances in technology have created an abundance of high-dimensional data and made its analysis possible. These data require new, computationally efficient methodology and new kind of asymptotic analysis. This thesis consists of four projects that deal with high-dimensional data with complex structures. The first project focuses on the graph estimation problem for Gaussian graphical models. Graphical models are commonly used in representing conditional independence between random variables, and learning the conditional independence structure from data has attracted much attention in recent years. However, almost all commonly used graph learning methods rely on the assumption that the observations share the same mean vector. In the first project, we extend the Gaussian graphical model to the setting where the observations are connected by a network and the mean vector can be different for different observations. We propose an efficient estimation method for the model, and under the assumption of network cohesion, we show that our method can accurately estimate the inverse covariance matrix as well as the corresponding graph structure, both from the theoretical perspective and using numerical studies. To further demonstrate the effectiveness of the proposed method, we also analyze a statisticians' coauthorship network data to learn the term dependency based on statistics publications. The second project addresses the directed acyclic graph (DAG) estimation problem. Estimation of the DAG structure is often a challenging problem as the computational complexity scales exponentially in the graph size when the total ordering of the DAG is unknown. To reduce the computational cost, and also with the aim of improving the estimation accuracy via the bias-variance trade-off, we propose a two-step approach for estimating the DAG, when data are generated from a linear structural equation model. In the first step, we infer the moral graph of the DAG via estimation of the inverse covariance matrix, which reduces the parameter space that one would search for the DAG. In the second step, we apply a penalized likelihood method for estimating the DAG restricted in the reduced space. Numerical studies indicate that the proposed method compares favorably with the one-step method in terms of both computational cost and estimation accuracy. The third and fourth projects investigate supervised learning problems. Specifically, in the third project, we study the cointegration problem for multivariate time series data and propose a method for identifying cointegrating vectors with simultaneously group and elementwise sparse structures. Such a sparsity structure enables the elimination of certain coordinates of the original multivariate series from all cointegrated series, leading to parsimonious and potentially more interpretable cointegrating vectors. Specifically, we formulate an optimization problem based on the profile likelihood and propose an iterative algorithm for solving the optimization problem. The proposed method has been evaluated on synthetic data and also applied to two real world data examples involving daily prices of financial sector stocks and monthly treasury yields of different maturities. In the fourth project, we focus on the learning to rank problem with sparse feature selection. In particular, we extend the rank support vector machine method to the sparse setting, by applying the lasso and elastic-net penalties. We also employ the bundle method and the order statistic tree data structure to reduce the computational complexity. Numerical results indicate that the proposed method works well in both simulation studies and a real world stock selection problem.

Subjects

High-Dimensional

Types

Thesis

Handle

https://hdl.handle.net/2027.42/140828

Metadata

Show full item record

Collections

Dissertations and Theses (Ph.D. and Master's)

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.