The data sources and methods used to process the raw data are described in the paper forthcoming in Science and the associated Supplementary Information. 
A preprint for an earlier version of this paper is available here: https://osf.io/preprints/socarxiv/754e3

This collection consists of 1 anonymized dataset and 2 analysis replication files:

1. invited_revs_deid_df.csv
This file provides metadata on invited reviewers for submissions to 60 IOP Publishing journals from January 2018-December 2022. Only the first round of review was analyzed; desk-rejected submissions and revisions were excluded. The following steps were taken to ensure anonymity:
- Deidentification: Journal, manuscript ID, reviewer name, year-month of invitation, and year-month of submitted review were deidentified using randomly generated, 10-character strings. Unique countries in the dataset were gathered from the union of corresponding and lead authors' and reviewers' countries. Countries were then deidentified using randomly generated, 10-character strings, so that they are consistent across the three variables (i.e., USA code is same for author and reviewer country columns).
- Noise: For each numerical variable, a random 1% of values were shuffled around (including missing values). 
- Quintiles: Team size, the only non-binary numerical variable, was logged and demeaned in the main paper analysis. This variable was also converted to quintiles for deidentification. 

2. deid_replication_main_figs.ipynb
This file replicates figures and analyses from the paper including estimates of differential access to same-country reviewers and the back of the envelope calculation. The recommendation has been converted to a binary variable ("Positivity") where 1=Accept or R/R and 0=Reject. 
- Note: Top 3 submitting countries (USA, China, India) were reidentified for country aggregate analysis. Since these are the most common countries in the dataset, this information should not be sufficient for identifying individual reviewers or submissions

3. deid_replication_regs.R
This file replicates analyses from the paper estimating the effects of same-country reviewer status on agreeing to review and reviewing positively. The recommendation has been converted to a binary variable ("Positivity") where 1=Accept or R/R and 0=Reject. 

General notes:
- The data in this collection are anonymized whereas the paper used raw data. Consequently, running the same code on these anonymized data vs. the data in the paper does not yield *identical* results but qualitatively similar ones. 

--------------------------------------------------------
Variable names glossary:
--------------------------------------------------------
*Identifiers/fixed effects variables*
journal_abbr_deid: Deidentified journal abbrevation
manuscript_id_original_deid: Deidentified manuscript ID
rev_full_name_deid: Deidentified reviewer full name
inv_year_month_deid: Deidentified year-month reviewer was invited
rev_year_month_deid: Deidentified year-month reviewer submitted review
auth_country_deid: Deidentified country of corresponding author
lead_country_deid: Deidentified country of lead author
rev_country_deid: Deidentified country of reviewer

*Numerical variables*
Positivity: Reviewer recommendation categories converted to a binary variable, 1=Accept or R/R and 0=Reject
submitted_review: 1=Invited reviewer submitted a review for that manuscript and 0=Invited reviewer did not submit a review for that manuscript
final_decision_binary: 1=Manuscript was ultimately accepted and 0=Ultimately rejected
SCR: (prior to shuffling) 1=Reviewer from same country as corresponding author
SCRlead: (prior to shuffling) 1=Reviewer from same country as lead author
anon_policy_available: instrumental variable, 1=Double-anonymization option was available in journal at time of submission and 0=Double-anonymization was not an option
anon_manu: 1=Author identity was hidden from reviewers and 0=Author identity was visible to reviewers
ln_team_size_bin: Number of authors on manuscript, logged, demeaned, and binned into quintiles
national: 1=Authors all from same country
regional: 1=Authors all from same region according to UN Geoscheme

*Categorical variables*
income_cat: Income category of corresponding author country based on 2023 World Bank Country Income and Lending Groups
income_cat2: Top 3 submitting countries (i.e., USA, China, India) are separate categories from High, Upper-middle, and Low and Lower-middle Income Groups, respectively, for 6 total categories
lead_income_cat: Income category of lead author country based on 2023 World Bank Country Income and Lending Groups