Connecting Concepts in the Brain by Mapping Cortical Representations of Semantic Relations

Zhang, Yizhen

Work Description

Title: Connecting Concepts in the Brain by Mapping Cortical Representations of Semantic Relations Open Access Deposited

Attribute	Value
Methodology	Nineteen human subjects (11 females, 8 males, age 24.4 ± 4.8, all right-handed) participated in this study. All subjects provided informed written consent according to a research protocol approved by the Institutional Review Board at Purdue University. While being scanned for fMRI, each subject was listening to several audio stories collected from The Moth Radio Hour ( https://themoth.org/radio-hour) and presented through binaural MR-compatible headphones (Silent Scan Audio Systems, Avotec, Stuart, FL). A single story was presented in each fMRI session (6 m 48 s ± 1 min 58 s). For each story, two repeated sessions were performed for the same subject. T1 and T2-weighted MRI and fMRI data were acquired in a 3T MRI system (Siemens, Magnetom Prisma, Germany) with a 64-channel receive-only phased-array head/neck coil. The fMRI data were acquired with 2 mm isotropic spatial resolution and 0.72 s temporal resolution by using a gradient-recalled echo-planar imaging sequence (multiband = 8, 72 interleaved axial slices, TR = 720 ms, TE = 31 ms, flip angle = 52°, field of view = 21 × 21 cm^2). The MRI and fMRI data were preprocessed by using the minimal preprocessing pipeline established for the HCP (using software packages AFNI, FMRIB Software Library, and FreeSurfer pipeline). After preprocessing, the images from individual subjects were co-registered onto a common cortical surface template. Then the fMRI data were spatially smoothed by using a gaussian surface smoothing kernel with a 2 mm standard deviation. To represent words as vectors, we used a pre-trained word2vec model ( https://code.google.com/archive/p/word2vec/). The model was able to convert any English word to a vector embedded in a 300-dimensional semantic space. We mapped the semantic space, as modeled by word2vec, to the cortex through voxel-wise linear encoding models.
Description	We collected hours of functional magnetic resonance imaging data from human subjects listening to natural stories. We developed a predictive model of the voxel-wise response and further applied it to thousands of new words to understand how the brain stores and connects different concepts. This is a dataset for the paper: Zhang, Y., Han, K., Worth, R., & Liu, Z. (2020). Connecting concepts in the brain by mapping cortical representations of semantic relations. Nature communications, 11(1), 1-13. https://doi.org/10.1038/s41467-020-15804-w. This project is also documented at https://osf.io/eq2ba/.
Creator	Zhang, Yizhen
Depositor	[email protected]
Contact information	[email protected]
Discipline	Science
Funding agency	National Institutes of Health (NIH)
Keyword	fMRI natural story comprehension neural encoding semantic processing word relations naturalistic stimuli
Citations to related material	Zhang, Y., Han, K., Worth, R., & Liu, Z. (2020). Connecting concepts in the brain by mapping cortical representations of semantic relations. Nature communications, 11(1), 1-13. https://doi.org/10.1038/s41467-020-15804-w
Resource type	Dataset
Last modified	11/18/2022
Published	10/22/2020
Language	English Matlab Python
DOI	https://doi.org/10.7302/mcmd-v749
License	http://creativecommons.org/publicdomain/zero/1.0/

To Cite this Work:
Zhang, Y. (2020). Connecting Concepts in the Brain by Mapping Cortical Representations of Semantic Relations [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/mcmd-v749

Relationships


This work is not a member of any user collections.

Files (Count: 8; Size: 208 GB)

Title	Original Upload	Last Modified	File Size	Access	Actions
README.txt	2020-10-15	2020-10-15	5.17 KB	Embargo	View Details
Natural_story_task_log.csv	2020-10-15	2020-10-15	4.57 KB	Embargo	View Details
raw_MRI_dataset.zip	2020-10-13	2020-10-14	188 GB	Embargo
code.zip	2020-10-15	2020-10-15	10.8 KB	Embargo	View Details
preprocessed_word_features.zip	2020-10-15	2020-10-15	27.7 MB	Embargo	View Details
script.zip	2020-10-15	2020-10-15	3.48 KB	Embargo	View Details
preprocessed_time_series_dataset.zip	2020-10-15	2020-10-16	12.3 GB	Embargo
result.zip	2020-10-15	2020-10-16	6.97 GB	Embargo	View Details

Date: 12 Oct, 2020

Title of related publication: Connecting Concepts in the Brain by Mapping Cortical Representations of Semantic Relations

Authors: Yizhen Zhang Kuan Han Robert M. Worth Zhongming Liu

Contact: Yizhen Zhang [email protected], Zhongming Liu [email protected]

Funding: This work was supported by National Institute of Mental Health R01MH104402, Purdue University, and the University of Michigan.

If you used the data in publications, please cite the paper below.
Zhang, Y., Han, K., Worth, R., & Liu, Z. (2020). Connecting concepts in the brain by mapping cortical representations of semantic relations. Nature communications, 11(1), 1-13.

Research Overview:
We collected hours of functional magnetic resonance imaging data from human subjects listening to natural stories. We developed a predictive model of the voxel-wise response and further applied it to thousands of new words to understand how the brain stores and connects different concepts.

Methods:
Nineteen human subjects (11 females, 8 males, all right-handed) participated in this study. While being scanned for fMRI, each subject was listening to several audio stories collected from The Moth Radio Hour (https://themoth.org/radio-hour) and presented through binaural MR-compatible headphones (Silent Scan Audio Systems, Avotec, Stuart, FL). A single story was presented in each fMRI session (average 6m 48s). For each story, two repeated sessions were performed for the same subject. The data was collected from Feb 2018 to July 2018 at Purdue University.
T1 and T2-weighted MRI and fMRI data were acquired in a 3T MRI system (Siemens, Magnetom Prisma, Germany) with a 64-channel receive-only phased-array head/neck coil. The fMRI data were acquired with 2mm isotropic spatial resolution and 0.72s temporal resolution by using a gradient-recalled echo-planar imaging sequence (multiband=8, 72 interleaved axial slices, TR=720ms, TE=31ms, flip angle=52, field of view=21cm by 21cm).
The MRI and fMRI data were preprocessed by using the minimal preprocessing pipeline established for the HCP [1] (using software packages AFNI, FMRIB Software Library, and FreeSurfer pipeline). After preprocessing, the images from individual subjects were co-registered onto a common cortical surface template. Then the fMRI data were spatially smoothed by using a gaussian surface smoothing kernel with a 2mm standard deviation.
To represent words as vectors, we used a pretrained word2vec model[2]. The model was able to convert any English word to a vector embedded in a 300-dimensional semantic space. We mapped the semantic space, as modeled by word2vec, to the cortex through voxel-wise linear encoding models.

File Inventory:
- Data from subject * is organized in directory raw_MRI_dataset/sub-*/. The T1 and T2-weighted MRI images are stored as raw_MRI_dataset/sub-*/anat/sub_*_T1w.nii.gz and raw_MRI_dataset/sub-*/anat/sub_*_T2w.nii.gz (NIFTI format). The functional MRI images are stored in directory raw_MRI_dataset/sub-*/func/. Within this directory, the folder "raw" contains the fMRI data without preprocessing (NIFTI format). The folder "mni" contains the preprocessed fMRI data in MNI space. The folder "cifti" contains the fMRI data on the cortical surface template (CIFTI format). The file name "sub-01_story-01_rep-1" refers to the fMRI data collected when subject 1 was listening to story 1 for the first repeated session. The NIFTI files can be read by standard MRI softwares (e.g. FSL:https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FSL, or packages in Matlab/Python). The surface data can be read by the connectome workbench toolbox developed by Human Connectome Project (https://www.humanconnectome.org/software/connectome_workbench).

- The detailed information of audio stimuli could be found in 'Natural_story_task_log.csv'.
- The preprocessed fMRI time series for each story are stored in directory preprocessed_time_series_dataset/. The preprocessed word features for each story are stored in directory preprocessed_word_features/.

- In code/, we shared our codes for training, testing, and cross-validating the voxel-wise encoding model. Detailed information is summarized in codes/readme.txt.
- In script/, we shared two matlab scripts:
1. concatenate_encoding_dataset.m: concatenate training and testing data for the encoding model
2. training_encoding_model.m: training and testing encoding model

- Encoding results are shared in result/encoding_result/.
- Results for cortical representations of semantic categories and semantic relations are shared in result/word_cortical_mappings.

Informed consent:
All subjects provided informed written consent according to a research protocol approved by the Institutional Review Board at Purdue University.

Use and Access:
This data set is made available under a Creative Commons Public Domain license (CC0 1.0).

Referece:
[1] Glasser, M. F. et al. The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage (2013).
[2] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionally. Advances in neural information processing systems. (2013).

Note:
This project is also documented at https://osf.io/eq2ba/.

Update Provenance Log Entries

Download All Files (To download individual files, select them in the “Files” panel above)

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to contact us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.