Work Description

Title: EEG Datasets for Naturalistic Listening to "Alice in Wonderland" (Version 2) Open Access Deposited

h
Attribute Value
Methodology
  • The data comprise 49 human electroencephalography (EEG) datasets collected at the University of Michigan Computational Neurolinguistics Lab. The data were recorded with 61 active electrodes and a Brain Products actiCHamp amplifier at 500 Hz (0.1 to 200 hz band). Participants listened passively to a 12.4 m audiobook recording of the first chapter of Alice's Adventures in Wonderland (librivox.org, catalog date 2006-01-12) and after which they completed a short 8-question comprehension questionnaire. The raw data are stored in the BrainVision Core Data Format ( https://www.brainproducts.com/support-resources/brainvision-core-data-format-1-0/). Example analyses of these data are available at  https://github.com/cnllab/alice-eeg-shared
Description
  • These files contain the raw data and processing parameters to go with the paper "Hierarchical structure guides rapid linguistic predictions during naturalistic listening" by Jonathan R. Brennan and John T. Hale. These files include the stimulus (wav files), raw data (BrainVision format), data processing parameters (matlab), and variables used to align the stimuli with the EEG data and for the statistical analyses reported in the paper (csv spreadsheet).

  • Updates in Version 2: - data in BrainVision format - added information about data analysis - corrected prePROCessing information for S02
Creator
Creator ORCID
Depositor
  • jobrenn@umich.edu
Contact information
Discipline
Funding agency
  • National Science Foundation (NSF)
ORSP grant number
  • AWD003231
Keyword
Citations to related material
  • Brennan, J. R., & Hale, J. T. (2019). Hierarchical structure guides rapid linguistic predictions during naturalistic listening. PLoS ONE 14(1). e0207741
Resource type
Last modified
  • 09/01/2023
Published
  • 09/01/2023
Language
DOI
  • https://doi.org/10.7302/746w-g237
License
To Cite this Work:
Brennan, J. R. (2023). EEG Datasets for Naturalistic Listening to "Alice in Wonderland" (Version 2) [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/746w-g237

Relationships

This work is not a member of any user collections.

Files (Count: 156; Size: 4.19 GB)

EEG Datasets for Naturalistic Listening to "Alice in Wonderland"
Version 2
Jonathan Brennan
ORCID: 0000-0002-3639-350X
https://sites.lsa.umich.edu/cnllab

Description:

These files contain the raw data and processing parameters to go with the paper "Hierarchical structure guides rapid linguistic predictions during naturalistic listening" by Jonathan R. Brennan and John T. Hale. These files include the stimulus (wav files), raw data (BrainVision format), data processing parameters (matlab), and variables used to align the stimuli with the EEG data and for the statistical analyses reported in the paper (csv spreadsheet).

Method:

The data comprise 49 human electroencephalography (EEG) datasets collected at the University of Michigan Computational Neurolinguistics Lab. The data were recorded with 61 active electrodes and a Brain Products actiCHamp amplifier at 500 Hz (0.1 to 200 hz band). Participants listened passively to a 12.4 m audiobook recording of the first chapter of Alice's Adventures in Wonderland (librivox.org, catalog date 2006-01-12) and after which they completed a short 8-question comprehension questionnaire. The raw data are stored in the BrainVision Core Data Format (https://www.brainproducts.com/support-resources/brainvision-core-data-format-1-0/).

Analysis:

Example analyses of these data are available at https://github.com/cnllab/alice-eeg-shared

Data Set Description:

"audio.zip" contains 12 stimulus files
- chapter one of "Alice's Adventures in Wonderland" from librivox.org
- divided into 12 .wav files

"S01.eeg", "S01.vhdr" "S01.vmrk" raw data files for S01 through S49
- each dataset is made of three files
- .eeg (raw data)
- .vhdr (meta-data)
- .vmrk (trigger information)

"proc.zip" contains pre-PROC-essing parameters for 42 datasets
- Matlab data file
- 7 datasets not represented as these were too noisy to pre-process
- includes channel rejections, epoch rejections, ICA unmixing matrix etc.

"datasets.mat"
- matlab file with variables indicating which datasets were:
- N=33 that were USEd in the main analysis
- N=8 that were excluded due to LOW PERFormance on the comprehension quiz
- N=8 that come from participants with HIGH NOISE (N=8)
- (note that 2 participants had both high noise and low performance!)

"comprehension-questions.doc"
- Multiple choice comprehension questions

"comprehension-scores.txt"
- Score out of 8 on a post-experiment comprehension questionnaire
- Also includes dataset rejection comments (due to behavioral results and/or noise)
- errata:
- Three participants skipped 4 questions; they received scores out of 4
- S21's score could not be found when compiling this table
- S39 was included in the original analysis despite not meeting behavioral criteria. (Note that the paper includes analysis disregarding behavioral criteria showing that this does not impact the results)

"easycapM10-acti61_elec.sfp"
- Electrode layout information in BESA surface point coordinate format

"AliceChapterOne-EEG.csv"
- csv file with word-by-word variables used for data analysis
- includes NGRAM, RNN, and CFG surprisal
- The spreadsheet has 16 columns and 2130 rows
- Each row is a word in the first chapter of Alice's Adventures in Wonderland
- Col 1 "Word" word token
- Col 2 "Segment" 1 through 12 indicating which of twelve audio segments this word appeared in
- Col 3 "onset" word's onset time in seconds relative to the beginning of the current segment
- Col 4 "offset" word's offset time in sec relative to the beginning of the current segment
- Col 5 "Order" Indicates word order (1, 2, 3... 2129) within full stimulus
- Col 6 "LogFreq" log-transformed word frequency from the English Lexicon Project HAL corpus
- Col 7, 8 "LogFreq_Prev" "LogFreq_Next" same measure for the previous and next word
- Col 9 "SndPower" Average power of auditory stimulus over 50ms after word onset
- Col 10 "Length" Word length in sec (offset - onset)
- Col 11 "Position" Indicates word position (1, 2, 3...) within each sentence
- Col 12 "Sentence" Indicates sentence position (1, 2, 3... 84) within full stimulus
- Col 13 "IsLexical" Logical indicating whether the word is a lexical/content (1) or function word (0)
- Col 14 "NGRAM" Surprisal values from the NGRAM language model
- Col 15 "RNN" Surprisal values from the RNN language model
- Col 16 "CFG" Surprisal values from the CFG language model

This dataset can be cited as:

Brennan, J. R. EEG Datasets for Naturalistic Listening to "Alice in Wonderland" (Version 2) [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/746w-g237

Or as part of a larger project:

Bhattasali, S., Brennan, J., Luh, W.-M., Franzluebber, B., & Hale, J. (2020). The Alice datasets: FMRI & EEG observations of natural language comprehension. Proceedings of the 12th International Language Resources and Evaluation Conference (LREC 2020). https://www.aclweb.org/anthology/2020.lrec-1.15/

The data were originally posted along-side the following publication:

Brennan, J. R., & Hale, J. T. (2019). Hierarchical structure guides rapid linguistic predictions during naturalistic listening. PLoS ONE 14(1). e0207741

History:

- August 27 2023: Second release
- data in BrainVision format
- added information about data analysis
- corrected prePROCessing information for S02
- December 10 2018: Initial release

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 4.19 GB may be too large to download directly. Consider using Globus (see below).

Files are ready   Download Data from Globus
Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.