Work Description

Title: English WikiProject coeditor networks and quality assessments Open Access Deposited

h
Attribute Value
Methodology
  • List of English language WikiProjects scraped from  https://tools.wmflabs.org/enwp10/cgi-bin/pindex.fcgi Coeditor networks were generated from a database dump of revision metadata from the English language Wikipedia. Nodes represent Wikipedia editors. There is a directed edge from editor S to editor T if any page edited by S was later edited by T. Edges are unweighted. WikiProject quality assessments were scraped from logs on the English language Wikipedia. These logs are regularly generated by WP 1.0 Bot.
Description
  • We analyzed the structure of English language WikiProject coeditor networks and compare to the efficiency and performance of those projects. The list of WikiProjects give an integer key, title, and unique URL for each project. The network files are indexed by the integer keys. The quality assessment logs are indexed by project title and article title.

  • Curation Notes: Readme file was updated Oct. 11, 2018 to include additional context on research, file contents, and organization (see first section of readme), and explanation of additional license in the deposit referring to the 'logbook' module.
Creator
Depositor
  • elplatt@umich.edu
Contact information
Discipline
Funding agency
  • National Science Foundation (NSF)
ORSP grant number
  • IIS-1617820
Keyword
Resource type
Last modified
  • 11/05/2019
Published
  • 04/09/2018
Language
DOI
  • https://doi.org/10.7302/Z2610XJB
License
To Cite this Work:
Platt, E. L. (2018). English WikiProject coeditor networks and quality assessments [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/Z2610XJB

Relationships

This work is not a member of any user collections.

Files (Count: 5; Size: 68.3 GB)

This archive contains data and code for the analysis presented in
(Platt & Romero, 2018). This study analyzed relationships between structural
properties of WikiProject co-editor networks and the performance/efficiency
of those WikiProjects. Co-editor networks were constructed from the entire
Wikipedia edit history by creating an edge between two editors if they had
edited the same article. The performance and efficiency of a WikiProject was
determined from the history of WikiProject article quality assessments.

In addition to the co-editor network, the project included agent-based model
simulations, which did not rely on any empirical data. The code for these
simulations is included in agent_based_model_code_git.tgz.

The code and data for the empirical WikiProject analysis are contained in
the wikiproject_code_git.tgz and wikiproject_data.tgz files respectively.
For convenience, the co-editor networks are also included by themselves in
the coeditor_networks.tgz file, using adjacency list format.
Reproducing the analysis also relies on external data sources that
have been archived elsewhere, linked below.

The data and code is organized into several directories, each containing a
Readme.md file with adittional information.

Contents

  • agent_based_model_code_git.tgz: git repository containing code for agent-based models. This repository includes a copy of the logbook module released under the 3-clause BSD license.
  • coeditor_networks.tgz: English-lanaguage WikiProject coeditor networks in adjacency-list format. Node ids are Wikipedia editor ids. WikiProject ids are mapped to titles in wikiproject_data.tgz.
  • wikiproject_code_git.tgz: git repository containing code for empirical analysis of performance and efficiency of WikiProject co-editor networks.
  • wikiproject_data.tgz: Data sets used by empirical analysis scripts.

External data

References

  • Platt, E. L. & Romero, D. M. (2018). Network Structure, Efficiency, and Performance on WikiProjects. In ICWSM.

Data citation

  • Platt, E. L., Livneh, D., Ramanathan, K., and Romero, D. M. (2018). English WikiProject coeditor networks and quality assessments.

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 68.3 GB is too large to download directly. Consider using Globus (see below).



Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.