Work Description

Title: Data associated with "Is Novel Research Worth Doing?" Open Access Deposited

h
Attribute Value
Methodology
  • This collection provides the following 2 datasets: 1. submissions_anonymized.csv This file provides metadata on submissions to Cell, Cell Reports, and selected IOP Publishing journals. The following steps were taken to ensure anonymity: - Quintiles: Numerical variables have been converted to quintiles. Quintiles are calculated separately for Cell Press journals and IOP journals. Due to skew, not all variables have all 5 quintiles, see "Duplicates" in  https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.qcut.html for details. - Keyword variables were converted to a binary variable where 1 = value is >0 and 0 = otherwise. - Noise: For each variable, a random 0.5% of values has been changed to values randomly selected from the existing values of that variable. - ID numbers have been converted to random strings - Manuscript type for IOP journals may be too identifying and was not included - Submissions with missing novelty / conventionality values are not included 2. reviews_anonymized.csv This file provides peer review recommendations for papers sent out for review. The recommendation has been converted to a binary variable where 1=Accept or R/R and 0=Reject. The following steps were taken to ensure anonymity: - Noise: For 2% of values, the binary recommendation has been "flipped" to the opposite value. - ID numbers have been converted to random strings
Description
  • The data sources and methods used to process the raw data are described in the paper  www.doi.org/10.1073/pnas.2118046119 and the associated Supplementary Information. These data are anonymized (see Methodology for details). Consequently, running the same code on these data vs. the data in the paper does not yield *identical* results but qualitatively similar ones.
Creator
Depositor
  • tepl@umich.edu
Contact information
Discipline
Citations to related material
Resource type
Curation notes
  • Nov. 17, 2022 - co-creators added to metadata.
Last modified
  • 11/18/2022
Published
  • 11/17/2022
DOI
  • https://doi.org/10.7302/ack7-as60
License
To Cite this Work:
Teplitskiy, M., Peng, H., Blasco, A., Lakhani, K. R. (2022). Data associated with "Is Novel Research Worth Doing?" [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/ack7-as60

Relationships

This work is not a member of any user collections.

Files (Count: 3; Size: 5.16 MB)

The data sources and methods used to process the raw data are described in the paper www.doi.org/10.1073/pnas.2118046119 and the associated Supplementary Information.
A preprint for an earlier version of this paper is available here: https://papers.ssrn.com/sol3/papers.cfm?abstractid=3920711

This collection consists of 2 anonymized datasets:

1. submissions_anonymized.csv
This file provides metadata on submissions to Cell, Cell Reports, and selected IOP Publishing journals. These are initial submissions and revisions are exlcuded from analysis. The following steps were taken to ensure anonymity:
- Quintiles: Numerical variables have been converted to quintiles. Quintiles are calculated separately for Cell Press journals and IOP journals. Due to skew, not all variables have all 5 quintiles, see "Duplicates" in https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.qcut.html for details.
- Keyword variables were converted to a binary variable (see Methodology)
- Noise: For each variable, a random 0.5% of values has been changed to values randomly selected from the existing values of that variable (including null values).
- ID numbers have been converted to random strings
- Manuscript type for IOP journals may be too identifying and was excluded from dataset
- Submissions with missing novelty / conventionality values are excluded from dataset

2. reviews_anonymized.csv
This file provides peer review recommendations for papers sent out for review. The recommendation has been converted to a binary variable where 1=Accept or R/R and 0=Reject.
The following steps were taken to ensure anonymity:
- Noise: For 2% of values, the binary recommendation has been "flipped" to the opposite value.
- ID numbers have been converted to random strings
- Reviewer IDs excluded from dataset

General notes:
- The data in this collection are anonymized whereas the paper used raw data. Consequently, running the same code on these anonymized data vs. the data in the paper does not yield *identical* results but qualitatively similar ones.
- The 2 files in this collection don't have an identical set of manuscript_id_anon's because some submissions were not peer reviewed (i.e. were desk rejected), and some were reviewed but were missing a valid novelty value in the Submissions file and were consequently dropped from it
- Novelty and conventionality were measured similarly to Uzzi et al. 2013 (https://www.science.org/doi/full/10.1126/science.1240474). See the Supplementary Information (at www.doi.org/10.1073/pnas.2118046119) for the differences.
- Variables related specifically to Cell or Cell Reports submissions are empty (null values) for IOP journal submissions.

--------------------------------------------------------
Variable names glossary:
--------------------------------------------------------
journal_submitted_to_anon: Cell, Cell Reports, or a random string for each of IOP journals used in the analytic sample
manuscript_id_anon:
submission_year
num_authors_quintile
num_refs_quintile: number of references in the published papers, quintiled
num_unique_journals_refs_std_quintile: number of unique journals referenced in the published paper, standardized, quintiled
tail_novelty_quintile:
median_conventionality_quintile
cites_1_quintile: citations accrued within 1 year after publication year, quintiled
cites_5_quintile: citations accrued within 5 years after publication year, quintiled
last_author_prior_pubs_quintile: last author number of publications, up to submission year, quintiled
last_author_prior_cell_pubs_quintile: last author number of Cell publications, up to submission year, quintiled
last_author_prior_cellrep_pubs_quintile: last author number of Cell Reports publications, up to submission year, quintiled
num_cell_cited_quintile: number of Cell papers referenced in published paper, quintiled
num_cellrep_cited_quintile: number of Cell Reports papers referenced in published paper, quintiled
recommendation_mean_quintile: the mean of reviewer's (binary) recommendations, quintiled
recommendation_std_quintile: the standard deviaion of reviewer's (binary) recommendations, quintiled
is_accepted
is_reviewed: =0 if desk rejected, =1 if sent out for review
keyword_XYZ: topical keyword of paper, =1 if raw value is >0 and =0 otherwise

Download All Files (To download individual files, select them in the “Files” panel above)

Best for data sets < 3 GB. Downloads all files plus metadata into a zip file.



Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.