Statistical Analysis for Network Data using Matrix Variate Models and Latent Space Models
Zhang, Xuefei
2020
Abstract
Network data capture the connectivity relationship among individuals and are ubiquitous in many scientific and engineering fields. This thesis focuses on developing statistical learning methodologies and novel statistical models for network data appearing in modern big data era. Classical supervised learning methods usually assume the training data points are independent samples. However, when individuals are connected by a network and interact in complex ways, the classical independence assumption may not hold. In such a scenario, incorporating the network information in modeling is expected to improve the prediction performance, as it provides additional information about relationships among individuals. We first focus on predicting a continuous response variable using both covariates and network information. Specifically, we propose a matrix variate model that allows two-way dependence among data points and among variables, to model the distribution of variables associated with nodes in a network. Under such a model, the derived distribution of each response depends on covariates of all the data points in the network in a principled way. We develop efficient algorithms for parameter estimation and also show consistency of the estimators under mild conditions. Further, we extend the proposed framework to handle the classification problem. The dimension of variables associated with nodes can be high in many modern data applications and such node variables usually provide important information for understanding network structure. In the second project, we consider the problem of modeling network data with node variables. The classical network latent space model assumes that the edge formation in a network depends on nodal latent variables as well as the observed node variables, however, it has several limitations to handle high-dimensional node variables. We propose an alternative model, named joint latent space model, where we assume that the latent variables not only explain the network structure, but also are informative for the multivariate node variables. We establish theoretical properties of the estimators and provide insights on how incorporating high-dimensional node variables could improve the estimation accuracy of the latent positions. We demonstrate the improvement in latent variable estimation and the improvements in associated downstream tasks by simulation studies and an application to a Facebook data example. Lastly, we extend statistical modeling from a single network to multiple networks. Entities often interact with each other through multiple types of relations, which can be represented as multilayer networks. Multilayer networks among the same set of nodes usually share common structures, while each layer can also possess its distinct node connecting behaviors. To capture such characteristics, we propose a flexible latent space model, where we embed each node with a latent vector shared among layers and a layer-specific effect for each layer, and let both elements together with a layer-specific connectivity matrix to determine edge formations. We establish theoretical properties of the maximum likelihood estimators and show that the upper bound of the common latent structure's estimation error is inversely proportional to the number of layers under mild conditions. The superior performance of the proposed model is demonstrated through simulation studies and applications to two real-world data examples.Subjects
statistical network analysis statistical learning matrix variate models latent space models
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.