Work Description

Title: Double-anonymous Peer Review Mitigates Country Homophily and the Harms of Low Reviewer Diversity: Deidentified data and replication code Open Access Deposited

h
Attribute Value
Methodology
  • This collection consists of 1 anonymized dataset and 2 analysis replication files:

  • 1. invited_revs_deid_df.csv: This file provides metadata on invited reviewers for submissions to 60 IOP Publishing journals from January 2018-December 2022. Only the first round of review was analyzed; desk-rejected submissions and revisions were excluded. The following steps were taken to ensure anonymity: - Deidentification: Journal, manuscript ID, reviewer name, year-month of invitation, and year-month of submitted review were deidentified using randomly generated, 10-character strings. Unique countries in the dataset were gathered from the union of corresponding and lead authors' and reviewers' countries. Countries were then deidentified using randomly generated, 10-character strings, so that they are consistent across the three variables (i.e., USA code is same for author and reviewer country columns). - Noise: For each numerical variable, a random 1% of values were shuffled around (including missing values). - Quintiles: Team size, the only non-binary numerical variable, was logged and demeaned in the main paper analysis. This variable was also converted to quintiles for deidentification.

  • 2. deid_replication_main_figs.ipynb: This file replicates figures and analyses from the paper including estimates of differential access to same-country reviewers and the back of the envelope calculation. The recommendation has been converted to a binary variable ("Positivity") where 1=Accept or R/R and 0=Reject. - Note: Top 3 submitting countries (USA, China, India) were reidentified for country aggregate analysis. Since these are the most common countries in the dataset, this information should not be sufficient for identifying individual reviewers or submissions

  • 3. deid_replication_regs.R: This file replicates analyses from the paper estimating the effects of same-country reviewer status on agreeing to review and reviewing positively. The recommendation has been converted to a binary variable ("Positivity") where 1=Accept or R/R and 0=Reject.
Description
  • The data sources and methods used to process the raw data are described in the paper forthcoming in Science and the associated Supplementary Information. A preprint for an earlier version of this paper is available here:  https://osf.io/preprints/socarxiv/754e3. These data are anonymized (see Methodology for details). Consequently, running the same code on these data vs. the data in the paper does not yield *identical* results but qualitatively similar ones.
Creator
Creator ORCID
Depositor
  • jamesmzd@umich.edu
Depositor creator
  • true
Contact information
Discipline
Funding agency
  • Other Funding Agency
Other Funding agency
  • Schmidt Futures

  • Science for Progress Initiative
ORSP grant number
  • HUM00194927
Keyword
Citations to related material
  • J. M. Z. Dumlao, M. Teplitskiy, Science, forthcoming.
  • Zumel Dumlao, J. M. and M. Teplitskiy. 2023. “The Effect of Reviewer Geographical Diversity on Evaluations Is Reduced by Anonymizing Submissions”. Retrieved (osf.io/preprints/socarxiv/754e3).
Resource type
Last modified
  • 10/11/2024
Published
  • 10/11/2024
Language
DOI
  • https://doi.org/10.7302/9kxh-6e41
License
To Cite this Work:
Dumlao, J. M. Z., Teplitskiy, M. (2024). Double-anonymous Peer Review Mitigates Country Homophily and the Harms of Low Reviewer Diversity: Deidentified data and replication code [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/9kxh-6e41

Relationships

This work is not a member of any user collections.

Files (Count: 4; Size: 106 MB)

The data sources and methods used to process the raw data are described in the paper forthcoming in Science and the associated Supplementary Information.
A preprint for an earlier version of this paper is available here: https://osf.io/preprints/socarxiv/754e3

This collection consists of 1 anonymized dataset and 2 analysis replication files:

1. invited_revs_deid_df.csv
This file provides metadata on invited reviewers for submissions to 60 IOP Publishing journals from January 2018-December 2022. Only the first round of review was analyzed; desk-rejected submissions and revisions were excluded. The following steps were taken to ensure anonymity:
- Deidentification: Journal, manuscript ID, reviewer name, year-month of invitation, and year-month of submitted review were deidentified using randomly generated, 10-character strings. Unique countries in the dataset were gathered from the union of corresponding and lead authors' and reviewers' countries. Countries were then deidentified using randomly generated, 10-character strings, so that they are consistent across the three variables (i.e., USA code is same for author and reviewer country columns).
- Noise: For each numerical variable, a random 1% of values were shuffled around (including missing values).
- Quintiles: Team size, the only non-binary numerical variable, was logged and demeaned in the main paper analysis. This variable was also converted to quintiles for deidentification.

2. deid_replication_main_figs.ipynb
This file replicates figures and analyses from the paper including estimates of differential access to same-country reviewers and the back of the envelope calculation. The recommendation has been converted to a binary variable ("Positivity") where 1=Accept or R/R and 0=Reject.
- Note: Top 3 submitting countries (USA, China, India) were reidentified for country aggregate analysis. Since these are the most common countries in the dataset, this information should not be sufficient for identifying individual reviewers or submissions

3. deid_replication_regs.R
This file replicates analyses from the paper estimating the effects of same-country reviewer status on agreeing to review and reviewing positively. The recommendation has been converted to a binary variable ("Positivity") where 1=Accept or R/R and 0=Reject.

General notes:
- The data in this collection are anonymized whereas the paper used raw data. Consequently, running the same code on these anonymized data vs. the data in the paper does not yield *identical* results but qualitatively similar ones.

--------------------------------------------------------
Variable names glossary:
--------------------------------------------------------
*Identifiers/fixed effects variables*
journal_abbr_deid: Deidentified journal abbrevation
manuscript_id_original_deid: Deidentified manuscript ID
rev_full_name_deid: Deidentified reviewer full name
inv_year_month_deid: Deidentified year-month reviewer was invited
rev_year_month_deid: Deidentified year-month reviewer submitted review
auth_country_deid: Deidentified country of corresponding author
lead_country_deid: Deidentified country of lead author
rev_country_deid: Deidentified country of reviewer

*Numerical variables*
Positivity: Reviewer recommendation categories converted to a binary variable, 1=Accept or R/R and 0=Reject
submitted_review: 1=Invited reviewer submitted a review for that manuscript and 0=Invited reviewer did not submit a review for that manuscript
final_decision_binary: 1=Manuscript was ultimately accepted and 0=Ultimately rejected
SCR: (prior to shuffling) 1=Reviewer from same country as corresponding author
SCRlead: (prior to shuffling) 1=Reviewer from same country as lead author
anon_policy_available: instrumental variable, 1=Double-anonymization option was available in journal at time of submission and 0=Double-anonymization was not an option
anon_manu: 1=Author identity was hidden from reviewers and 0=Author identity was visible to reviewers
ln_team_size_bin: Number of authors on manuscript, logged, demeaned, and binned into quintiles
national: 1=Authors all from same country
regional: 1=Authors all from same region according to UN Geoscheme

*Categorical variables*
income_cat: Income category of corresponding author country based on 2023 World Bank Country Income and Lending Groups
income_cat2: Top 3 submitting countries (i.e., USA, China, India) are separate categories from High, Upper-middle, and Low and Lower-middle Income Groups, respectively, for 6 total categories
lead_income_cat: Income category of lead author country based on 2023 World Bank Country Income and Lending Groups

Download All Files (To download individual files, select them in the “Files” panel above)

Best for data sets < 3 GB. Downloads all files plus metadata into a zip file.



Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.