Work Description

Title: English WikiProject coeditor networks and quality assessments Open Access Deposited
Attribute Value
  • List of English language WikiProjects scraped from Coeditor networks were generated from a database dump of revision metadata from the English language Wikipedia. Nodes represent Wikipedia editors. There is a directed edge from editor S to editor T if any page edited by S was later edited by T. Edges are unweighted. WikiProject quality assessments were scraped from logs on the English language Wikipedia. These logs are regularly generated by WP 1.0 Bot.
  • We analyzed the structure of English language WikiProject coeditor networks and compare to the efficiency and performance of those projects. The list of WikiProjects give an integer key, title, and unique URL for each project. The network files are indexed by the integer keys. The quality assessment logs are indexed by project title and article title.

  • Curation Notes: Readme file was updated Oct. 11, 2018 to include additional context on research, file contents, and organization (see first section of readme), and explanation of additional license in the deposit referring to the 'logbook' module.
Contact information
Funding agency
  • National Science Foundation (NSF)
ORSP grant number
  • IIS-1617820
Citations to related material
Resource type
Last modified
  • 11/05/2019
  • 04/09/2018
To Cite this Work:
Platt, E. (2018). English WikiProject coeditor networks and quality assessments [Data set]. University of Michigan - Deep Blue.


Files (Count: 5; Size: 68.3 GB)

This archive contains data and code for the analysis presented in
(Platt & Romero, 2018). This study analyzed relationships between structural
properties of WikiProject co-editor networks and the performance/efficiency
of those WikiProjects. Co-editor networks were constructed from the entire
Wikipedia edit history by creating an edge between two editors if they had
edited the same article. The performance and efficiency of a WikiProject was
determined from the history of WikiProject article quality assessments.

In addition to the co-editor network, the project included agent-based model
simulations, which did not rely on any empirical data. The code for these
simulations is included in agent_based_model_code_git.tgz.

The code and data for the empirical WikiProject analysis are contained in
the wikiproject_code_git.tgz and wikiproject_data.tgz files respectively.
For convenience, the co-editor networks are also included by themselves in
the coeditor_networks.tgz file, using adjacency list format.
Reproducing the analysis also relies on external data sources that
have been archived elsewhere, linked below.

The data and code is organized into several directories, each containing a file with adittional information.


  • agent_based_model_code_git.tgz: git repository containing code for agent-based models. This repository includes a copy of the logbook module released under the 3-clause BSD license.
  • coeditor_networks.tgz: English-lanaguage WikiProject coeditor networks in adjacency-list format. Node ids are Wikipedia editor ids. WikiProject ids are mapped to titles in wikiproject_data.tgz.
  • wikiproject_code_git.tgz: git repository containing code for empirical analysis of performance and efficiency of WikiProject co-editor networks.
  • wikiproject_data.tgz: Data sets used by empirical analysis scripts.

External data


  • Platt, E. L. & Romero, D. M. (2018). Network Structure, Efficiency, and Performance on WikiProjects. In ICWSM.

Data citation

  • Platt, E. L., Livneh, D., Ramanathan, K., and Romero, D. M. (2018). English WikiProject coeditor networks and quality assessments.

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 68.3 GB is too large to download directly. Consider using Globus (see below).

Files are ready   Download Data from Globus
Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus