Statistical Methods for Networks with Node Covariates
Liu, Yumu
2020
Abstract
Network data, which represent relations or interactions between individual entities, together with nodal covariates information, arise in many scientific and engineering fields such as biology and social science. This dissertation focuses on developing statistical models and theory that utilize information from both the network structure and node covariates to improve statistical learning tasks, such as community detection and missing value imputation. The first project studies the problem of community detection for degree-heterogeneous networks with covariates, where we aim to cluster the nodes into groups that share similar patterns in link connectivity and/or covariates distribution. We consider incorporating node covariates via a flexible degree-corrected block model by allowing the community memberships to depend on node covariates, while the link probabilities are determined by both node community memberships and degree parameters. We develop two algorithms, one using the variational inference and the other based on the pseudo-likelihood for estimating the proposed model. Simulation studies indicate that the proposed model can obtain better community detection results compared to methods that only utilize the network information. Further, we show that under mild conditions, the community memberships and the covariate parameters can be estimated consistently. The second project considers the problem of missing value imputation when individuals are linked through a network. We assume the edges in the network are related with the distances in the covariates of the individuals through a latent space network model. We propose an iterative imputation algorithm that is flexible and utilizes both the correlation among node variables and the connectivity between observations given by the network. We relate the proposed method to a Bayesian model and discuss the convergence of the imputation distribution when the specified conditional models for imputation are compatible with the true underlying model of the covariates. We also use simulation studies and a data example to illustrate empirically that the imputation accuracy can be improved by incorporating network information. The final contribution of this dissertation is on incorporating covariates under the edge exchangeable framework. Edge exchangeable models have attractive theoretical and practical properties which make them appropriate for modeling many sparse real-world interaction networks constructed through edge sampling mechanisms. However, as far as we know, there is no edge exchangeable network model that allows for node covariates. In the third project, we propose a model that incorporates node covariates under the edge exchangeable model framework and show that it enjoys properties such as sparsity, and partial exchangeability. We further develop a maximum likelihood estimation method to estimate the model parameters and demonstrate its performance through both simulation studies and a data example.Subjects
Network Data Covariates Community Detection Missing Imputation Edge Exchangeable Models
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.