Work Description

Title: Human challenge study dataset 2015 Open Access Deposited

h
Attribute Value
Methodology
  • The data in this repository was collected from 18 human volunteers over an 8 day period in Sept. 2015 who were enrolled in a human rhinovirus (HRV) challenge study at the University of Virginia in Sept 2015. Inoculation with live virus (RV39) was administered on the 4th day (Day 0) and a diverse set of biological, physiological and cognitive variables were collected over the full duration of the study.
Description
  • The data deposited here is as follows: The clinical shedding/symptom data, RNAseq, steroid, and wearable E4 data was partially presented in publications [1]-[3] and the cognitive lumos and VAFS data is presented in the paper [4], which is under review and embargoed. The data files are: subject.json, sample.json, and genematrix_TPM.csv. In addition, a copy of the blank consent form used to enroll volunteers in the study is included (17964_Adult Consent_2015Mar17-Mod 1_clean.pdf).

  • Clinical symptom and viral shedding data (in subject.json): reports each subject's accumulated and maximum self-reported symptom score (modified Jackson score) and shedding titrations from nasal-pharyngeal washes after inoculation.

  • RNAseq data (genematrix_TMP.csv): Whole blood was collected in PAXgene™ Blood RNA tubes (PreAnalytiX), and total RNA extracted using the PAXgene™ Blood miRNA Kit (QIAGEN) using the manufacturer’s recommended protocol. RNA quantity and quality were assessed using Nanodrop 2000 spectrophotometer (Thermo-Fisher) and Bioanalyzer 2100 with RNA 6000 Nano Chips (Agilent). RNA sequencing libraries were prepared using Illumina TruSeq mRNA Library Kit with RiboZero Globin depletion, and sequenced on an Illumina NextSeq sequencer with 50bp paired-end reads (target 40M reads per sample). After demultiplexing to FASTQ paired-end read counts files, the 396 samples were TPM transformed using HISAT2 software with the reference genome Homo_sapiens.GRCh38.84. Each sample corresponds to one of the 18 subjects at one of 22 time points. One of these samples was of insufficient quality to be mapped to read counts. In addition to the TPM normalized RNAseq data contained in this repository, the raw FASTQ data for the 395 samples are deposited in the GEO repository ( https://www.ncbi.nlm.nih.gov/geo), Accession # GSE215087.

  • Cognitive data (sample.json): Outcomes from a NeuroCognitive Performance Test (NCPT) that was taken approximately 3 time daily by all volunteers. The NCPT is a repeatable, web-based, computerized, cognitive assessment platform designed to measure subtle changes in performance across multiple cognitive domains. Subject scores along 18 cognitive variables data were collected at approximated 22 time points during the challenge study. The data structure sample.json contains the raw cognitive data and the extracted 18 cognitive scores over time for each subject.

  • The Visual Analog Fatigue Scale (sample.json): the VAFS is a measure of cognitive fatigue that was measured approximately 3 times per day at the same time as the NCPT and blood draw.

  • Wearable device data (sample.json): participants wore an Empatica E4 device for the duration of the challenge study. Summarized features are provided for each subject that include sleep duration (mean and std), sleep offset (mean and std), and temperature (mean and std).

  • Steroid data was also collected and is included in the sample.json. This steroid data was collected from the whole blood samples and consists of cortisol, melatonin, and DHEAS.

  • See README.txt for more specific details on the data structures contained in the sample.json, subject.json, and genematrix_TPM.csv files.
Creator
Depositor
  • hero@umich.edu
Contact information
Discipline
Funding agency
  • Department of Defense (DOD)
ORSP grant number
  • 15-PAF03042, F040803
Keyword
Citations to related material
  • X She, Y Zhai, R Henao, CW Woods, C Chiu, Geoffrey S. Ginsburg, Peter X.K. Song, AO. Hero, “Adaptive multi-channel event segmentation and feature extraction for monitoring health outcomes,” IEEE Transactions on Biomedical Engineering, vol. 68, no. 8, pp. 2377-2388, Aug. 2021, doi: 10.1109/TBME.2020.3038652. Available on arxiv:2008.09215
  • Emilia Grzesiak, Brinnae Bent, Micah T. McClain, Christopher W. Woods, Ephraim L. Tsalik, Bradly P. Nicholson, Timothy Veldman, Thomas W. Burke, Zoe Gardener, Emma Bergstrom, Ronald B. Turner, Christopher Chiu, P. Murali Doraiswamy, Alfred Hero, Ricardo Henao, Geoffrey S. Ginsburg, Jessilyn Dunn Assessment of the Feasibility of Using Noninvasive Wearable Biometric Monitoring Sensors to Detect Influenza and the Common Cold Before Symptom Onset. JAMA Netw Open. 2021;4(9):e2128534. doi:10.1001/jamanetworkopen.2021.28534
  • E Sabeti, S Oh, PX Song, A Hero. “A Pattern Dictionary Method for Anomaly Detection,” Entropy, vol 24, pp. 1095 Aug 2022. doi: 10.3390/e24081095
  • Yaya Zhai, P. Murali Doraiswamy, Christopher W. Woods, Ronald B. Turner, Thomas W. Burke, Geoffrey S. Ginsburg, Alfred O. Hero, "Pre-exposure cognitive performance variability is associated with severity of respiratory infection," manuscript under review.
Resource type
Last modified
  • 12/29/2022
Published
  • 10/12/2022
Language
DOI
  • https://doi.org/10.7302/90mc-9h22
License
To Cite this Work:
Hero, A. O., Zhai, Y., Burke, T., Doraiswamy, M., Ginsburg, G. S., Henao, R., Turner, R. B., Woods, C. W. (2022). Human challenge study dataset 2015 [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/90mc-9h22

Relationships

This work is not a member of any user collections.

Files (Count: 5; Size: 87.3 MB)

Readme file to accompany dataset DeepBlueData deposited dataset "Human challenge study 2015." The clinical shedding/symptom data, RNAseq, and wearable E4 data was partially presented in publications [1]-[3] and the cognitive lumos and VAFS data is presented in the paper [4], which is under review and embargoed. The data files are: subject.json, sample.json, and genematrix_TPM.csv.

---------------subject.json and sample.json----------------

The json files were created by applying jsonencode() to matlab data structures. After reading in these respective json files to matlab and applying jsondecode to the json data structures you will obtain two data structures

1. A subject data structure for 18 subjects containing demographic data, the 21 visit times for biological&cognitive sample collection, shedding and symptom status, daily shedding data over the 8 days of the study, and the binary accumulated shedding label (low or high)

>> subject =
struct with fields:

age: [18×1 double] - age of each subject
gender: {18×1 cell} - gender of each subject
visit_time: [18×21 double] - time that subject visited the clinic for blood draws, cognitive tests, and E4 data downloads (excludes time at 480 hours)
shedding: [18×8 double] - viral shedding was measured over the 4 days after inoculation (matrix padded with zeros for pre-inoculation days)
shedding_time: [8×1 double] - the times at which shedding was measured
ShedLabel: {18×1 cell} - the label denoting (subject ID, gender, infection status (low(0) or high(1), resp., if accumulated shedding below or above median)

2. A sample data structure containing collected steriod data, empatica E4 data, interpolated titer data (to 21 time points excluding follow up time at 480 hours, after study ended), full symptom data, lumos data over 21 time points, and the VAFS data over 21 time points.

>> sample =

struct with fields:

steroid: [1×1 struct] - the raw steroid data
E4: [1×1 struct] - the Empatica E4 wearable extracted features
titer: [1×1 struct] - the shedding data
symptom: [1×1 struct] - the raw symptom data
lumos: [1×1 struct] - the cognitive lumos testing data
vafs: [1×1 struct] - the cognitive VAFS data

>> sample.steroid =

struct with fields:

raw_data: [3×396 double] - 3 steroids were assayed at all 18x22 time points (includes followup time at 480 hours post-inoculation)
subject: [396×1 double] - subject label for each of the 396 samples
time: [396×1 double] - sampling time label for each of the 396 samples
imtrx: [18×22 double] - a matrix indexing sample number to the subject (row) and the time point (col)
allsubj: [18×1 double] - the subject ID's
alltime: [22×1 double] - the time points (0 denotes the inoculation time)
steroid_names: {3×1 cell} - names of each of the 3 assayed steroids

>> sample.E4

ans =

struct with fields:

allsubj: [16×1 double] - only 16 of the subjects in challenge study had viable E4 data (See [1])
inoculation_time: [16×1 double] - the time of inoculation for each subject
E4_names: {4×1 cell} - the names of the 4 signals measured from the E4
E4_variable_names: {15×1 cell} - the feature variables extracted from the E4 (Using the protocol of [1])
E4_data: {16×1 cell} - For each subject in allsubj (indexed by row), the 15 feature values extracted over 10 minute time segments.

>> sample.titer

ans =

struct with fields:

allsubj: [18×1 double] - the indices of the 18 subjects
alltime: [21×1 double] - the sampling times (excluding 480 hours)
titer_score: [18×21 double] -the shedding titers interpolated (nearest neighbor) to the 21 sample times
subj_order: [18×1 double] - the ordering of subjects in decreasing order of accumulated shedding over the entire study

>> sample.symptom

ans =

struct with fields:

allsubj: [18×1 double] - the indices of the 18 subjects
alltime: [14×1 double] - the times that the self reporting symptom diaries were filled out
symptom_score: [18×14×8 double] - the raw reported symptoms along 8 symptom categories with scores from 0-5 (0 no symptom)
score_sum: [18×14 double] - the modified Jackson score over the 8 symptoms
symptom_names: {8×1 cell} - the names of each of the symptoms subjects were asked to score
subj_order: [18×1 double] - the ordering of subjects in decreasing order of accumulated modified Jackson score over the entire study

>> sample.lumos

ans =

struct with fields:

innoculation_time: [18×1 double] - the innoculation times for each subject
allsubj: [18×1 double] - the subject indices
lumos_names: {4×1 cell} - the names of the 4 principal categories of 18 NCPT scores extracted from the Lumos cognitive test (using protocol in [2])
data: {18×1 cell} - the NCPT data for the 18 subjects measured as 18 NCPT variables over a number of testing sessions (ranging from 18 to 24)
rownames: {19×1 cell} - the names of the lumos variables (first row is time and 18 remaining rows are NCPT variable names)

>> sample.vafs

ans =

struct with fields:

vafs_raw: [462×4 table] - Raw Visual Analog Fatigue Scale (VAFS) scores
vafs_scores_pre: [18×2 double] - VAFS averaged over the pre-inoculation period. First column is subject index and second col is average VAFS.

>> sample.vafs.vafs_raw

ans =

462×1 struct array with fields:

Subject - the subject index
Day - the day (day zero is the inoculation day - day 4)
Time - the time of day (AM, PM1, PM2)
VAFS - the visual VAFS for the particular subject, day and time.


----------genematrix_TPM.csv------------------------------
This comma seperated matrix is a TPM normalized matrix of RNAseq read counts that were extracted from 790 paired-end FastQ files corresponding to 395 blood samples drawn from 18 subjects over 8 days, approximately 3 times per day. The columns of the matrix are indexed by subject ID, day (DayMinus3 to Day 4), and time (AM, PM1, PM2) of the blood sample collected in the challenge study experiment (described in [2]-[4]). The read counts in each FASTQ file were aligned and mapped to the reference genome Homo_sapiens.GRCh38.84 using HISAT2 2.0.4 software.

-------------------------------------------------------------

References

[1] X She, Y Zhai, R Henao, CW Woods, C Chiu, Geoffrey S. Ginsburg, Peter X.K. Song, AO. Hero, “Adaptive multi-channel event segmentation and feature extraction for monitoring health outcomes,” IEEE Transactions on Biomedical Engineering, vol. 68, no. 8, pp. 2377-2388, Aug. 2021, doi: 10.1109/TBME.2020.3038652. Available on arxiv:2008.09215

[2] Emilia Grzesiak, Brinnae Bent, Micah T. McClain, Christopher W. Woods, Ephraim L. Tsalik, Bradly P. Nicholson, Timothy Veldman, Thomas W. Burke, Zoe Gardener, Emma Bergstrom, Ronald B. Turner, Christopher Chiu, P. Murali Doraiswamy, Alfred Hero, Ricardo Henao, Geoffrey S. Ginsburg, Jessilyn Dunn Assessment of the Feasibility of Using Noninvasive Wearable Biometric Monitoring Sensors to Detect Influenza and the Common Cold Before Symptom Onset. JAMA Netw Open. 2021;4(9):e2128534. doi:10.1001/jamanetworkopen.2021.28534

[3] E Sabeti, S Oh, PX Song, A Hero. “A Pattern Dictionary Method for Anomaly Detection,” Entropy, vol 24, pp. 1095 Aug 2022. doi: 10.3390/e24081095

[4] Y. Zhai, P. Murali Doraiswamy, Christopher W. Woods, Ronald B. Turner, Thomas W Burke, Geoffrey S Ginsburg, Alfred Hero, “Pre-exposure cognitive performance variability is associated with severity of respiratory infection,” submitted and in second round review 2022.

Download All Files (To download individual files, select them in the “Files” panel above)

Best for data sets < 3 GB. Downloads all files plus metadata into a zip file.



Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.