Work Description
Title: Supplemental Data for PNAS Nexus paper titled "How Digital Paywalls Shape News Coverage" Open Access Deposited
Attribute | Value |
---|---|
Methodology |
|
Description |
|
Creator | |
Depositor |
|
Contact information | |
Discipline | |
Keyword | |
Resource type | |
Last modified |
|
Published |
|
DOI |
|
License |
(2024). Supplemental Data for PNAS Nexus paper titled "How Digital Paywalls Shape News Coverage" [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/k5rj-3n91
Relationships
- This work is not a member of any user collections.
Files (Count: 12; Size: 9.37 GB)
Thumbnailthumbnail-column | Title | Original Upload | Last Modified | File Size | Access | Actions |
---|---|---|---|---|---|---|
![]() |
Deep_Blue_Data_Readme.txt | 2024-10-21 | 2024-10-21 | 5.48 KB | Open Access |
|
|
Data_Processing_Pipeline.png | 2024-10-18 | 2024-10-18 | 295 KB | Open Access |
|
![]() |
compressed.zip | 2024-10-21 | 2024-10-21 | 9.36 GB | Open Access |
|
![]() |
labelled_select.zip | 2024-10-21 | 2024-10-21 | 250 KB | Open Access |
|
![]() |
us_state_population_2010_2019.xlsx | 2024-10-17 | 2024-10-17 | 17.2 KB | Open Access |
|
![]() |
census_acs_migration_flows_count....json | 2024-10-17 | 2024-10-17 | 1020 KB | Open Access |
|
![]() |
topic_modelling.ipynb | 2024-10-18 | 2024-10-18 | 6.45 KB | Open Access |
|
![]() |
prepare_data_for_statistical_mod...ipynb | 2024-10-18 | 2024-10-18 | 142 KB | Open Access |
|
![]() |
merge_migration_data.ipynb | 2024-10-18 | 2024-10-18 | 147 KB | Open Access |
|
![]() |
did-final-PNAS-Nexus.R | 2024-10-19 | 2024-10-19 | 2.84 KB | Open Access |
|
![]() |
RnR_monthly_staggered_did_data_s...a.csv | 2024-10-19 | 2024-10-19 | 1.58 MB | Open Access |
|
![]() |
county_population_est.xlsx | 2024-10-21 | 2024-10-21 | 293 KB | Open Access |
|
Date: 18 October, 2024
Dataset Title: Replication Data for PNAS Nexus paper title 'How Digital Paywalls Shape News Coverage'
Dataset Creators: Anmol Panda, Paramveer Dhillon, Libby Hemphill
Dataset Contact: Anmol Panda anmolp@umich.edu
Key Points:
- We assess the causal impact of paywalls on digital news content
- We find that both the number soft and local news articles declined after paywalls were introduced
- Our findings indicate that there is a non-trivial relationship between financial imperatives and editorial choices
Research Overview:
The internet has significantly transformed how news is produced, consumed, and distributed. As a result, the news industry has transitioned from ad-supported to subscription-based models regulated by digital paywalls. In light of this disruption, it’s crucial to investigate not only how news consumers adapt to this change but also how economic incentives shape content coverage. We analyzed the staggered adoption of digital paywalls by 17 regional U.S. newspapers over 17 years in a difference-in-difference framework to examine the impact of paywall adoption on topical news content coverage. Our results reveal a small but significant decrease in local and soft news coverage, with varying effects across different urban contexts. Specifically, local news coverage experienced a more substantial decline in smaller cities (population < 500,000) and regions experiencing an influx of younger residents (age < 40 years). Conversely, soft news coverage increased in areas with a younger demographic influx, indicating a strategic shift by newspapers to cater to digital-savvy audiences and adapt to changing consumption patterns. Our findings underscore the delicate balance between financial imperatives and editorial choices in the newspaper industry and highlight the need for ongoing research into the effects of digital monetization strategies on journalistic content creation, media plurality, and civic accountability.
Methodology:
We used data from ProQuest for 17 newspapers, and used KMeans clustering to identify topics in each paper. We then use Poisson regression to understand the causal impact of paywall introduction on newspaper content.
Files contained here:
The zipped folders contain two types of files:
compressed.zip - Word Embeddings and KMeans clusters
Each file contains the embeddings of all articles of a newspaper, for a total of 17 newspapers.
To open the files, uncompress the Zipped folder and use the following Python code:
df = pd.read_json(file_path, compression="bz2")
Each file is a data frame with the following columns:
article_id - unique ID for each article
date - date of publication of the news article
title text - text of news article
embedding - word embedding of news article's body constructed using sentence-transformer
paper - newspaper name
cluster - cluster number from KMeans clustering
week - week count of the article's publication date, relative to the earliest date in the dataset
title_len - Length of article's title text (number of tokens)
labelled_select.zip - Excel files of topics labelled for each cluster
Each file contains 100 clusters, one file for each newspaper, for all 17 newspapers.
To open the file in Python, use the following Python code:
df = pd.read_excel(file_path)
Each file is a data frame with the following columns:
cluster - cluster number from KMeans clustering
top_20 - Most common 20 tokens in the article's title
local - Local articles (1 if it is local, 0 if not local)
hard - Hard articles (1 if it is a hard article, 0 if it is soft, 2 if neither)
There are seven other files in the archive - four code files, three data files, and one image
- Data Files
us_state_population_2010_2019.xlsx - US Census Population Data by state
county_population_est - US Census Population Data by county
census_acs_migration_flows_county_by_age.json - Migration data by age groups for each county from the American Community Survey 2011-2015
RnR_monthly_staggered_did_data_single_file_w_migration_data.csv - Data file for statistical analysis
- Code Files
topic_modelling.ipynb - Code for Kmeans Clustering (n_clusters=100)
Input Data : Each file from the 'compressed' folder
Output files : Each file in the 'labelled_select' folder
prepare_data_for_statistical_modelling.ipynb - Code to aggregate data by month
Input Data : Files from the 'compressed' and 'labelled_select' folders
merge_migration_data.ipynb : Code to combine aggregated data with census data for migration and population
Input Data : Output from 'prepare_data_for_statistical_modelling.ipynb'
Output Data : RnR_monthly_staggered_did_data_single_file_w_migration_data.csv
did-final-PNAS-NEXUS.R : Code for causal inference in R (referred to as poisson_reg.R in the project pipeline figure)
Input Data : RnR_monthly_staggered_did_data_single_file_w_migration_data.csv (change file path in the code file)
- Image File
Data_Processing_Pipeline.png : Data Analysis pipeline for the project
Details the code files at every stage of the pipeline
Please change file paths to under the correct files gets loaded.
Note: Please change all paths for file read and write operations to ensure the scripts run correctly.
Use and Access:
This data set is made available under a Creative Commons Attribution-NonCommercial 4.0 International license (CC BY-NC 4.0).
To Cite Data:
How Digital Paywalls Shape News Coverage, Dhillon, Panda, Hemphill, PNAS-Nexus (2024)