Date: 12 Oct, 2020

Title of related publication: Connecting Concepts in the Brain by Mapping Cortical Representations of Semantic Relations

Authors: Yizhen Zhang Kuan Han Robert M. Worth Zhongming Liu

Contact: Yizhen Zhang zhyz@umich.edu, Zhongming Liu zmliu@umich.edu

Funding: This work was supported by National Institute of Mental Health R01MH104402, Purdue University, and the University of Michigan.

If you used the data in publications, please cite the paper below.
Zhang, Y., Han, K., Worth, R., & Liu, Z. (2020). Connecting concepts in the brain by mapping cortical representations of semantic relations. Nature communications, 11(1), 1-13.

Research Overview:
We collected hours of functional magnetic resonance imaging data from human subjects listening to natural stories. We developed a predictive model of the voxel-wise response and further applied it to thousands of new words to understand how the brain stores and connects different concepts.

Methods:
Nineteen human subjects (11 females, 8 males, all right-handed) participated in this study. While being scanned for fMRI, each subject was listening to several audio stories collected from The Moth Radio Hour (https://themoth.org/radio-hour) and presented through binaural MR-compatible headphones (Silent Scan Audio Systems, Avotec, Stuart, FL). A single story was presented in each fMRI session (average 6m 48s). For each story, two repeated sessions were performed for the same subject. The data was collected from Feb 2018 to July 2018 at Purdue University.
T1 and T2-weighted MRI and fMRI data were acquired in a 3T MRI system (Siemens, Magnetom Prisma, Germany) with a 64-channel receive-only phased-array head/neck coil. The fMRI data were acquired with 2mm isotropic spatial resolution and 0.72s temporal resolution by using a gradient-recalled echo-planar imaging sequence (multiband=8, 72 interleaved axial slices, TR=720ms, TE=31ms, flip angle=52, field of view=21cm by 21cm).
The MRI and fMRI data were preprocessed by using the minimal preprocessing pipeline established for the HCP [1] (using software packages AFNI, FMRIB Software Library, and FreeSurfer pipeline). After preprocessing, the images from individual subjects were co-registered onto a common cortical surface template. Then the fMRI data were spatially smoothed by using a gaussian surface smoothing kernel with a 2mm standard deviation.
To represent words as vectors, we used a pretrained word2vec model[2]. The model was able to convert any English word to a vector embedded in a 300-dimensional semantic space. We mapped the semantic space, as modeled by word2vec, to the cortex through voxel-wise linear encoding models.


File Inventory:
- Data from subject * is organized in directory raw_MRI_dataset/sub-*/. The T1 and T2-weighted MRI images are stored as raw_MRI_dataset/sub-*/anat/sub_*_T1w.nii.gz and raw_MRI_dataset/sub-*/anat/sub_*_T2w.nii.gz (NIFTI format). The functional MRI images are stored in directory raw_MRI_dataset/sub-*/func/. Within this directory, the folder "raw" contains the fMRI data without preprocessing (NIFTI format). The folder "mni" contains the preprocessed fMRI data in MNI space. The folder "cifti" contains the fMRI data on the cortical surface template (CIFTI format). The file name "sub-01_story-01_rep-1" refers to the fMRI data collected when subject 1 was listening to story 1 for the first repeated session. The NIFTI files can be read by standard MRI softwares (e.g. FSL:https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FSL, or packages in Matlab/Python). The surface data can be read by the connectome workbench toolbox developed by Human Connectome Project (https://www.humanconnectome.org/software/connectome_workbench).

- The detailed information of audio stimuli could be found in 'Natural_story_task_log.csv'.
- The preprocessed fMRI time series for each story are stored in directory preprocessed_time_series_dataset/. The preprocessed word features for each story are stored in directory preprocessed_word_features/.

- In code/, we shared our codes for training, testing, and cross-validating the voxel-wise encoding model. Detailed information is summarized in codes/readme.txt.
- In script/, we shared two matlab scripts:
	1. concatenate_encoding_dataset.m: concatenate training and testing data for the encoding model
	2. training_encoding_model.m: training and testing encoding model

- Encoding results are shared in result/encoding_result/.
- Results for cortical representations of semantic categories and semantic relations are shared in result/word_cortical_mappings.

Informed consent:
All subjects provided informed written consent according to a research protocol approved by the Institutional Review Board at Purdue University.

Use and Access: 
This data set is made available under a Creative Commons Public Domain license (CC0 1.0).

Referece:
[1] Glasser, M. F. et al. The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage (2013).
[2] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionally. Advances in neural information processing systems. (2013).

Note:
This project is also documented at https://osf.io/eq2ba/.