Supplemental Data for PNAS Nexus paper titled "How Digital Paywalls Shape News Coverage"

Dhillon, Paramveer; Panda, Anmol; Hemphill, Libby

Work Description

Title: Supplemental Data for PNAS Nexus paper titled "How Digital Paywalls Shape News Coverage" Open Access Deposited

Attribute	Value
Methodology	This data consists of two types of files -- code files and data files. The code files were written by the authors of the paper -- Anmol Panda and Paramveer Dhillon at the University of Michigan. The data files are word embeddings of news articles. These were created in ProQuest TDM Studio and can be shared publicly. The original news articles are owned by ProQuest, and therefore cannot be shared here.
Description	The internet has significantly transformed how news is produced, consumed, and distributed. As a result, the news industry has transitioned from ad-supported to subscription-based models regulated by digital paywalls. In light of this disruption, it’s crucial to investigate not only how news consumers adapt to this change but also how economic incentives shape content coverage. We analyzed the staggered adoption of digital paywalls by 17 regional U.S. newspapers over 17 years in a difference-in-difference framework to examine the impact of paywall adoption on topical news content coverage. Our results reveal a small but significant decrease in local and soft news coverage, with varying effects across different urban contexts. Specifically, local news coverage experienced a more substantial decline in smaller cities (population < 500,000) and regions experiencing an influx of younger residents (age < 40 years). Conversely, soft news coverage increased in areas with a younger demographic influx, indicating a strategic shift by newspapers to cater to digital-savvy audiences and adapt to changing consumption patterns. Our findings underscore the delicate balance between financial imperatives and editorial choices in the newspaper industry and highlight the need for ongoing research into the effects of digital monetization strategies on journalistic content creation, media plurality, and civic accountability.
Creator	Dhillon, Paramveer Panda, Anmol Hemphill, Libby
Depositor	anmolp@umich.edu
Contact information	anmolp@umich.edu
Discipline	Social Sciences
Keyword	news media; paywalls; causal inference
Resource type	Dataset
Last modified	10/21/2024
Published	10/21/2024
DOI	https://doi.org/10.7302/k5rj-3n91
License	http://creativecommons.org/licenses/by-nc/4.0/

To Cite this Work:
Dhillon, P., Panda, A., Hemphill, L. (2024). Supplemental Data for PNAS Nexus paper titled "How Digital Paywalls Shape News Coverage" [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/k5rj-3n91

Relationships


This work is not a member of any user collections.

Files (Count: 12; Size: 9.37 GB)

Title	Original Upload	Last Modified	File Size	Access	Actions
Deep_Blue_Data_Readme.txt	2024-10-21	2024-10-21	5.48 KB	Open Access	View Details Download
Data_Processing_Pipeline.png	2024-10-18	2024-10-18	295 KB	Open Access	View Details Download
compressed.zip	2024-10-21	2024-10-21	9.36 GB	Open Access	View Details Download
labelled_select.zip	2024-10-21	2024-10-21	250 KB	Open Access	View Details Download
us_state_population_2010_2019.xlsx	2024-10-17	2024-10-17	17.2 KB	Open Access	View Details Download
census_acs_migration_flows_count....json	2024-10-17	2024-10-17	1020 KB	Open Access	View Details Download
topic_modelling.ipynb	2024-10-18	2024-10-18	6.45 KB	Open Access	View Details Download
prepare_data_for_statistical_mod...ipynb	2024-10-18	2024-10-18	142 KB	Open Access	View Details Download
merge_migration_data.ipynb	2024-10-18	2024-10-18	147 KB	Open Access	View Details Download
did-final-PNAS-Nexus.R	2024-10-19	2024-10-19	2.84 KB	Open Access	View Details Download
RnR_monthly_staggered_did_data_s...a.csv	2024-10-19	2024-10-19	1.58 MB	Open Access	View Details Download
county_population_est.xlsx	2024-10-21	2024-10-21	293 KB	Open Access	View Details Download

Date: 18 October, 2024

Dataset Title: Replication Data for PNAS Nexus paper title 'How Digital Paywalls Shape News Coverage'

Dataset Creators: Anmol Panda, Paramveer Dhillon, Libby Hemphill

Dataset Contact: Anmol Panda anmolp@umich.edu

Key Points:
- We assess the causal impact of paywalls on digital news content
- We find that both the number soft and local news articles declined after paywalls were introduced
- Our findings indicate that there is a non-trivial relationship between financial imperatives and editorial choices

Research Overview:
The internet has significantly transformed how news is produced, consumed, and distributed. As a result, the news industry has transitioned from ad-supported to subscription-based models regulated by digital paywalls. In light of this disruption, it’s crucial to investigate not only how news consumers adapt to this change but also how economic incentives shape content coverage. We analyzed the staggered adoption of digital paywalls by 17 regional U.S. newspapers over 17 years in a difference-in-difference framework to examine the impact of paywall adoption on topical news content coverage. Our results reveal a small but significant decrease in local and soft news coverage, with varying effects across different urban contexts. Specifically, local news coverage experienced a more substantial decline in smaller cities (population < 500,000) and regions experiencing an influx of younger residents (age < 40 years). Conversely, soft news coverage increased in areas with a younger demographic influx, indicating a strategic shift by newspapers to cater to digital-savvy audiences and adapt to changing consumption patterns. Our findings underscore the delicate balance between financial imperatives and editorial choices in the newspaper industry and highlight the need for ongoing research into the effects of digital monetization strategies on journalistic content creation, media plurality, and civic accountability.

Methodology:
We used data from ProQuest for 17 newspapers, and used KMeans clustering to identify topics in each paper. We then use Poisson regression to understand the causal impact of paywall introduction on newspaper content.

Files contained here:
The zipped folders contain two types of files:

compressed.zip - Word Embeddings and KMeans clusters
Each file contains the embeddings of all articles of a newspaper, for a total of 17 newspapers.
To open the files, uncompress the Zipped folder and use the following Python code:

df = pd.read_json(file_path, compression="bz2")

Each file is a data frame with the following columns:
article_id - unique ID for each article
date - date of publication of the news article
title text - text of news article
embedding - word embedding of news article's body constructed using sentence-transformer
paper - newspaper name
cluster - cluster number from KMeans clustering
week - week count of the article's publication date, relative to the earliest date in the dataset
title_len - Length of article's title text (number of tokens)

labelled_select.zip - Excel files of topics labelled for each cluster
Each file contains 100 clusters, one file for each newspaper, for all 17 newspapers.
To open the file in Python, use the following Python code:

df = pd.read_excel(file_path)

Each file is a data frame with the following columns:
cluster - cluster number from KMeans clustering
top_20 - Most common 20 tokens in the article's title
local - Local articles (1 if it is local, 0 if not local)
hard - Hard articles (1 if it is a hard article, 0 if it is soft, 2 if neither)

There are seven other files in the archive - four code files, three data files, and one image

- Data Files
us_state_population_2010_2019.xlsx - US Census Population Data by state
county_population_est - US Census Population Data by county
census_acs_migration_flows_county_by_age.json - Migration data by age groups for each county from the American Community Survey 2011-2015
RnR_monthly_staggered_did_data_single_file_w_migration_data.csv - Data file for statistical analysis

- Code Files
topic_modelling.ipynb - Code for Kmeans Clustering (n_clusters=100)
Input Data : Each file from the 'compressed' folder
Output files : Each file in the 'labelled_select' folder
prepare_data_for_statistical_modelling.ipynb - Code to aggregate data by month
Input Data : Files from the 'compressed' and 'labelled_select' folders
merge_migration_data.ipynb : Code to combine aggregated data with census data for migration and population
Input Data : Output from 'prepare_data_for_statistical_modelling.ipynb'
Output Data : RnR_monthly_staggered_did_data_single_file_w_migration_data.csv
did-final-PNAS-NEXUS.R : Code for causal inference in R (referred to as poisson_reg.R in the project pipeline figure)
Input Data : RnR_monthly_staggered_did_data_single_file_w_migration_data.csv (change file path in the code file)

- Image File
Data_Processing_Pipeline.png : Data Analysis pipeline for the project
Details the code files at every stage of the pipeline
Please change file paths to under the correct files gets loaded.

Note: Please change all paths for file read and write operations to ensure the scripts run correctly.

Use and Access:
This data set is made available under a Creative Commons Attribution-NonCommercial 4.0 International license (CC BY-NC 4.0).

To Cite Data:
How Digital Paywalls Shape News Coverage, Dhillon, Panda, Hemphill, PNAS-Nexus (2024)

Update Provenance Log Entries

Download All Files (To download individual files, select them in the “Files” panel above)

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to contact us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.