Topics in High-Dimensional Unsupervised Learning.

Guo, Jian

Topics in High-Dimensional Unsupervised Learning.

Guo, Jian

2011

View/Open

guojian_1.pdf

(6.4MB

PDF)

Abstract

The first part of the dissertation introduces several new methods for estimating the structure of graphical models. Firstly, we consider estimating graphical models with discrete variables, including nominal variables and ordinal variables. For the nominal variables, we prove the asymptotic properties of the joint neighborhood selection method proposed by Hoefling and Tibshirani (2009) and Wang et al. (2009), which is used to fit high-dimensional graphical models with binary random variables. We show that this method is consistent in terms of both parameter estimation and structure estimation and extend it to general nominal variables. For ordinal variables, we introduce a new graphical model, which assumes that the ordinal variables are generated by discretizing marginal distributions of a latent multivariate Gaussian distribution and the relationships of these ordinal variables are described by the underlying Gaussian graphical model. We develop an EM-like algorithm to estimate the underlying latent network and apply the mean field theory to improve computational efficiency. We also consider the problem of jointly estimating multiple graphical models which share the variables but come from different categories. Compared with separate estimation for each category, the proposed joint estimation method significantly improves performance when graphical models in different categories have some similarities. We develop joint estimation methods both for Gaussian graphical models and for graphical models for categorical variables. In the second part of the dissertation, we develop two methods to improve interpretability of high-dimensional unsupervised learning methods. First, we introduce a pairwise variable selection method for high-dimensional model-based clustering. Unlike existing variable selection methods for clustering problems, the proposed method not only selects the informative variables, but also identifies which pairs of clusters are separable by each informative variable. We also propose a new method to identify both sparse structures and “block” structures in factor loadings in principal component analysis. This is achieved by forcing highly correlated variables to have identical factor loadings via a regularization penalty.

Subjects

Graphical Model

High-dimensinonal Data Analysis

Network Analysis

Unsupervised Learning

Types

Thesis

Handle

https://hdl.handle.net/2027.42/86374

Metadata

Show full item record

Collections

Dissertations and Theses (Ph.D. and Master's)

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.