The data and the scripts are to show that seizure onset dynamics and evoked responses change over the progression of epileptogenesis defined in this intrahippocampal tetanus toxin rat model. All tests explored in this study can be repeated with the data and scripts included in this repository. and Dataset citation: Crisp, D.N., Cheung, W., Gliske, S.V., Lai, A., Freestone, D.R., Grayden, D.B., Cook, MJ., Stacey, W.C. (2019). Epileptogenesis modulates spontaneous and responsive brain state dynamics [Data set]. University of Michigan Deep Blue Data Repository. https://doi.org/10.7302/r6vg-9658
The relationship between words in a sentence often tell us more about the underlying semantic content of a document than its actual words, individually. Recent publications in the natural language processing arena, more specifically using word embeddings, try to incorporate semantic aspects into their word vector representation by considering the context of words and how they are distributed in a document collection. In this work, we propose two novel algorithms, called Flexible Lexical Chain II and Fixed Lexical Chain II that combine the semantic relations derived from lexical chains, prior knowledge from lexical databases, and the robustness of the distributional hypothesis in word embeddings into a single decoupled system. In short, our approach has three main contributions: (i) unsupervised techniques that fully integrate word embeddings and lexical chains; (ii) a more solid semantic representation that considers the latent relation between words in a document; and (iii) lightweight word embeddings models that can be extended to any natural language task. Knowledge-based systems that use natural language text can benefit from our approach to mitigate ambiguous semantic representations provided by traditional statistical approaches. The proposed techniques are tested against seven word embeddings algorithms using five different machine learning classifiers over six scenarios in the document classification task. Our results show that the integration between lexical chains and word embeddings representations sustain state-of-the-art results, even against more complex systems.
This data set is a collection of word similarity benchmarks (RG65, MEN3K, Wordsim 353, simlex999, SCWS, yp130, simverb3500) in their original format and converted into a cosine similarity scale.
In addition, we have two Wikpedia Dumps from 2010 (April) and 2018 (January) in which we provide the original format (raw words), converted using the techniques described in the paper (MSSA, MSSA-D and MSSA-NR) (title in this repository), and also the word embeddings models for 300d and 1000d using a word2vec implementation. A readme.txt is provided with more details for each file.
These files contain the raw data and processing parameters to go with the paper "Hierarchical structure guides rapid linguistic predictions during naturalistic listening" by Jonathan R. Brennan and John T. Hale. These files include the stimulus (wav files), raw data (matlab format for the Fieldtrip toolbox), data processing paramters (matlab), and variables used to align the stimuli with the EEG data and for the statistical analyses reported in the paper.
Videos made in the course of linguistic fieldwork. Includes blacksmithing, hide tanning, weaving, cotton spinning, weaving, reed flute making, pottery making, and construction in Dogon villages, and exotic traditional hair styling in Hombori (Songhay). Some of the videos are "compilations" of many short clips, others are in standard documentary form.
This data and scripts are meant to test and show seizure differentiation based on bifurcation theory. A zip file is included which contains real and simulated seizure waveforms, Matlab scripts, and metadata. The matlab scripts allow for visual review validation and objective feature analysis. The file “README.txt” provides more detail about each individual file within the zip file. and Data citation: Crisp, D.N., Saggio, M.L., Scott, J., Stacey, W.C., Nakatani, M., Gliske, S.F., Lin, J. (2019). Epidynamics: Navigating the map of seizure dynamics - Code & Data [Data set]. University of Michigan Deep Blue Data Repository. https://doi.org/10.7302/ejhy-5h41
This collection represents various raw data and analysis of cores extracted during the November 2008 mission of R/V Melville in the Santa Barbara Basin., The core included is the jumbo piston core MV0811-14JC. Core photos, physical properties and magnetic susceptibility from the multisensor track (MST), and the scanning X-ray fluorescence (XRF) data are included in the collection., and Cruise DOI: 10.7284/903459
The research is funded by NSF OCE-1304327.
This collection represents various raw data and analysis of cores extracted during the January 2009 mission of the research vessel Sproul in the Santa Barbara Basin., Cores included: box core SPR0901-04BC, box core SPR0901-unnamed, and Kasten core SPR0901-03KC. Core photos, physical properties and magnetic susceptibility from the multisensor track (MST), and the scanning X-ray fluorescence (XRF) data are included in the collection., and Cruise DOI: 10.7284/901089
This research is funded by NSF-OCE 0752093.
Estimated phylogenetic relationships based on more than 18,000 loci in 93 individuals (full data) or 21 individuals (subset data) representing 19 described species and two putative undescribed species. Nine files are part of this dataset, including all input files to infer the phylogenetic reconstructions and the outputs obtained, in addition to a pruned tree used to infer the ancestral state reconstructions.
This work contains the experimental data and associated analysis that are described in the research publication entitled "Ultra-specific and Amplification-free Quantification of Mutant DNA by Single-molecule Kinetic Fingerprinting". This work contains multiple zip files, each of which represents one of the principal experiment groups presented in the publication. Each experiment group contains movie and analysis files corresponding to various experimental conditions related to that experiment group.
The search data supports a literature review project on lifestyle therapies for the management of atrial fibrillation. The data included in the dataset are the reproducible search strategies (in docx) and the exported results of all citations from all databases (txt and ris files). These searches and exported result files contain all citations originating from the database searches that were considered for inclusion.
Jalkunan is an endangered language of the Mande family, spoken in the village cluster of Blédougou in southwestern Burkina Faso. The lexical work complements a published grammar with texts. See the readme for further information.
Our project, mainly on Dogon languages of Mali, has branched out to Burkina Faso with emphasis on documentation of the most endangered languages. Tiefo-N was studied on an emergency basis since it was down to two aging competent speakers. For additional comments and links to a reference grammar, see the readme file.
The work on the Bangime language, spoken by the Bangande people, was carried out as part of a larger linguistic fieldwork project focused on Dogon languages. Bangime is confirmed as a language isolate with no demonstrable linguistic relatives—possibly the only such isolate in West Africa.
Bankan Tey is a Dogon language spoken in the village complex Walo (also spelled Oualo) near Douentza in central Mali. It is closely related to Ben Tey within Dogon. As of May 2018, Bankan Tey remains on my “to do” list in terms of grammatical description and texts. These recordings were made in Walo in 2011 and have not yet been transcribed although there is a fair chance I will be able to work on them in the next few years. If nothing materializes before 2022, I authorize other linguists to transcribe, translate, and/or analyse the texts.
2011 side A
2011 side B
Dogul Dom is a Dogon language spoken over a broad area on the Dogon (Bandiagara) plateau, mainly north(-west) of Bandiagara. A grammar was published electronically at Language Description Heritage Library in 2016. http://ldh.clld.org/2016/07/01/escidoc2326691-3 It is backed up at Deep Blue documents. http://hdl.handle.net/2027.42/123061 Dogul Dom texts were recorded digitally at Nantanga village in 2015. Portions were transcribed and presented in the grammar as Text T01 and Text T02 Dogul Dom Nantanga 2015-01 (about 9:30 minutes), Text T01 in grammar Dogul Dom Nantanga 2015-02 (about 4:30 minutes), Text T02 in grammar
Donno So is a Dogon language spoken over a wide area on the Dogon (Bandiagara) plateau, mainly between Bandiagara and the eastern edge of the plateau. It is also called Kamma So. A grammar was published electronically at Language Description Heritage Library in 2016: and http://ldh.clld.org/2016/07/01/escidoc2491630-3 This is backed up at Deep Blue documents. http://hdl.handle.net/2027.42/123062Thirteen texts were recorded digitally in Wendekele village south of Bandiagara in approximately 2015. Because of equipment problems the texts are rather faint and difficult to transcribe. Five texts were transcribed and translated, and presented at the end of the grammar volume. The correspondences are these:
Published volume: text 1, Recording: DS 02, title: hare and other animals (tale);
text 2, DS 09, report on trip to Burkina;
text 3, DS 10, blacksmith;
text 4, DS 03, squirrel and hare (tale);
text 5, DS 11, Fulbe herders.
Recordings DS 01(tale of stepmother), 04 (farming), 05 (construction),06 (animals), 07 (hunting), 08 (herding), 12 (marriage), and 13 (korobasinging) are not transcribed as of May 2018. I grant permission to other linguists to transcribe, translate, and/or analyse them.