Statistical Methods for Networks with Node Covariates

Liu, Yumu

Statistical Methods for Networks with Node Covariates

Liu, Yumu

2020

View/Open

liuyumu_1.pdf

(987.4KB

PDF)

Abstract

Network data, which represent relations or interactions between individual entities, together with nodal covariates information, arise in many scientific and engineering fields such as biology and social science. This dissertation focuses on developing statistical models and theory that utilize information from both the network structure and node covariates to improve statistical learning tasks, such as community detection and missing value imputation. The first project studies the problem of community detection for degree-heterogeneous networks with covariates, where we aim to cluster the nodes into groups that share similar patterns in link connectivity and/or covariates distribution. We consider incorporating node covariates via a flexible degree-corrected block model by allowing the community memberships to depend on node covariates, while the link probabilities are determined by both node community memberships and degree parameters. We develop two algorithms, one using the variational inference and the other based on the pseudo-likelihood for estimating the proposed model. Simulation studies indicate that the proposed model can obtain better community detection results compared to methods that only utilize the network information. Further, we show that under mild conditions, the community memberships and the covariate parameters can be estimated consistently. The second project considers the problem of missing value imputation when individuals are linked through a network. We assume the edges in the network are related with the distances in the covariates of the individuals through a latent space network model. We propose an iterative imputation algorithm that is flexible and utilizes both the correlation among node variables and the connectivity between observations given by the network. We relate the proposed method to a Bayesian model and discuss the convergence of the imputation distribution when the specified conditional models for imputation are compatible with the true underlying model of the covariates. We also use simulation studies and a data example to illustrate empirically that the imputation accuracy can be improved by incorporating network information. The final contribution of this dissertation is on incorporating covariates under the edge exchangeable framework. Edge exchangeable models have attractive theoretical and practical properties which make them appropriate for modeling many sparse real-world interaction networks constructed through edge sampling mechanisms. However, as far as we know, there is no edge exchangeable network model that allows for node covariates. In the third project, we propose a model that incorporates node covariates under the edge exchangeable model framework and show that it enjoys properties such as sparsity, and partial exchangeability. We further develop a maximum likelihood estimation method to estimate the model parameters and demonstrate its performance through both simulation studies and a data example.

Subjects

Network Data

Covariates

Community Detection

Missing Imputation

Edge Exchangeable Models

Types

Thesis

Handle

https://hdl.handle.net/2027.42/163165

Metadata

Show full item record

Collections

Dissertations and Theses (Ph.D. and Master's)

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.