Work Description
Title: Data associated with "Is Novel Research Worth Doing?" Open Access Deposited
Attribute | Value |
---|---|
Methodology |
|
Description |
|
Creator | |
Depositor |
|
Contact information | |
Discipline | |
Citations to related material | |
Resource type | |
Curation notes |
|
Last modified |
|
Published |
|
DOI |
|
License |
(2022). Data associated with "Is Novel Research Worth Doing?" [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/ack7-as60
Relationships
- This work is not a member of any user collections.
Files (Count: 3; Size: 5.16 MB)
Thumbnailthumbnail-column | Title | Original Upload | Last Modified | File Size | Access | Actions |
---|---|---|---|---|---|---|
README.txt | 2022-11-11 | 2022-11-17 | 3.22 KB | Open Access |
|
|
submissions_anonymized.csv | 2022-11-11 | 2022-11-11 | 4.06 MB | Open Access |
|
|
reviews_anonymized.csv | 2022-11-11 | 2022-11-11 | 1.09 MB | Open Access |
|
The data sources and methods used to process the raw data are described in the paper www.doi.org/10.1073/pnas.2118046119 and the associated Supplementary Information.
A preprint for an earlier version of this paper is available here: https://papers.ssrn.com/sol3/papers.cfm?abstractid=3920711
This collection consists of 2 anonymized datasets:
1. submissions_anonymized.csv
This file provides metadata on submissions to Cell, Cell Reports, and selected IOP Publishing journals. These are initial submissions and revisions are exlcuded from analysis. The following steps were taken to ensure anonymity:
- Quintiles: Numerical variables have been converted to quintiles. Quintiles are calculated separately for Cell Press journals and IOP journals. Due to skew, not all variables have all 5 quintiles, see "Duplicates" in https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.qcut.html for details.
- Keyword variables were converted to a binary variable (see Methodology)
- Noise: For each variable, a random 0.5% of values has been changed to values randomly selected from the existing values of that variable (including null values).
- ID numbers have been converted to random strings
- Manuscript type for IOP journals may be too identifying and was excluded from dataset
- Submissions with missing novelty / conventionality values are excluded from dataset
2. reviews_anonymized.csv
This file provides peer review recommendations for papers sent out for review. The recommendation has been converted to a binary variable where 1=Accept or R/R and 0=Reject.
The following steps were taken to ensure anonymity:
- Noise: For 2% of values, the binary recommendation has been "flipped" to the opposite value.
- ID numbers have been converted to random strings
- Reviewer IDs excluded from dataset
General notes:
- The data in this collection are anonymized whereas the paper used raw data. Consequently, running the same code on these anonymized data vs. the data in the paper does not yield *identical* results but qualitatively similar ones.
- The 2 files in this collection don't have an identical set of manuscript_id_anon's because some submissions were not peer reviewed (i.e. were desk rejected), and some were reviewed but were missing a valid novelty value in the Submissions file and were consequently dropped from it
- Novelty and conventionality were measured similarly to Uzzi et al. 2013 (https://www.science.org/doi/full/10.1126/science.1240474). See the Supplementary Information (at www.doi.org/10.1073/pnas.2118046119) for the differences.
- Variables related specifically to Cell or Cell Reports submissions are empty (null values) for IOP journal submissions.
--------------------------------------------------------
Variable names glossary:
--------------------------------------------------------
journal_submitted_to_anon: Cell, Cell Reports, or a random string for each of IOP journals used in the analytic sample
manuscript_id_anon:
submission_year
num_authors_quintile
num_refs_quintile: number of references in the published papers, quintiled
num_unique_journals_refs_std_quintile: number of unique journals referenced in the published paper, standardized, quintiled
tail_novelty_quintile:
median_conventionality_quintile
cites_1_quintile: citations accrued within 1 year after publication year, quintiled
cites_5_quintile: citations accrued within 5 years after publication year, quintiled
last_author_prior_pubs_quintile: last author number of publications, up to submission year, quintiled
last_author_prior_cell_pubs_quintile: last author number of Cell publications, up to submission year, quintiled
last_author_prior_cellrep_pubs_quintile: last author number of Cell Reports publications, up to submission year, quintiled
num_cell_cited_quintile: number of Cell papers referenced in published paper, quintiled
num_cellrep_cited_quintile: number of Cell Reports papers referenced in published paper, quintiled
recommendation_mean_quintile: the mean of reviewer's (binary) recommendations, quintiled
recommendation_std_quintile: the standard deviaion of reviewer's (binary) recommendations, quintiled
is_accepted
is_reviewed: =0 if desk rejected, =1 if sent out for review
keyword_XYZ: topical keyword of paper, =1 if raw value is >0 and =0 otherwise