The relationship between words in a sentence often tell us more about the underlying semantic content of a document than its actual words, individually. Recent publications in the natural language processing arena, more specifically using word embeddings, try to incorporate semantic aspects into their word vector representation by considering the context of words and how they are distributed in a document collection. In this work, we propose two novel algorithms, called Flexible Lexical Chain II and Fixed Lexical Chain II that combine the semantic relations derived from lexical chains, prior knowledge from lexical databases, and the robustness of the distributional hypothesis in word embeddings into a single decoupled system. In short, our approach has three main contributions: (i) unsupervised techniques that fully integrate word embeddings and lexical chains; (ii) a more solid semantic representation that considers the latent relation between words in a document; and (iii) lightweight word embeddings models that can be extended to any natural language task. Knowledge-based systems that use natural language text can benefit from our approach to mitigate ambiguous semantic representations provided by traditional statistical approaches. The proposed techniques are tested against seven word embeddings algorithms using five different machine learning classifiers over six scenarios in the document classification task. Our results show that the integration between lexical chains and word embeddings representations sustain state-of-the-art results, even against more complex systems.
This data set is a collection of word similarity benchmarks (RG65, MEN3K, Wordsim 353, simlex999, SCWS, yp130, simverb3500) in their original format and converted into a cosine similarity scale.
In addition, we have two Wikpedia Dumps from 2010 (April) and 2018 (January) in which we provide the original format (raw words), converted using the techniques described in the paper (MSSA, MSSA-D and MSSA-NR) (title in this repository), and also the word embeddings models for 300d and 1000d using a word2vec implementation. A readme.txt is provided with more details for each file.
The outputs include the steady state solutions for all Galileo flybys, the particle information for plotting the distribution functions near the reconnection site, the particle and field data for mapping the energetic flux densities, and 3D files for visualizing the whole simulation domain. More details can be found in Readme.txt.
This dataset includes core physical properties (e.g., bulk density, porosity, P-wave velocity) and magnetic susceptibility data for SPR0901-04BC (34.2816°N, 120.0415°W, 588 m water depth) measured on the multisensor track (MST). SPR0901-04BC was collected by the research vessel R/V Sproul off Southern California in 2009.1. The study is supported by NSF OCE-0752093.
This data and scripts are meant to test and show seizure differentiation based on bifurcation theory. A zip file is included which contains real and simulated seizure waveforms, Matlab scripts, and metadata. The matlab scripts allow for visual review validation and objective feature analysis. The file “README.txt” provides more detail about each individual file within the zip file. and Data citation: Crisp, D.N., Saggio, M.L., Scott, J., Stacey, W.C., Nakatani, M., Gliske, S.F., Lin, J. (2019). Epidynamics: Navigating the map of seizure dynamics - Code & Data [Data set]. University of Michigan Deep Blue Data Repository. https://doi.org/10.7302/ejhy-5h41
This dataset includes scanning X-ray fluorescence (XRF) data for the core MV0811-14JC (34.2818°N
120.0360°W, water depth: 582 m), which was collected by the research vessel R/V Melville off Southern California in 2008.11. The research is funded by NSF OCE-1304327.
This dataset includes scanning X-ray fluorescence (XRF) data for the core SPR0901-03KC (34.2832°N, 120.0401°W, 586 m water depth). SPR0901-03KC was collected by the research vessel R/V Sproul off Southern California in 2009.1. The research is funded by NSF OCE-0752093.
This dataset includes scanning X-ray fluorescence (XRF) data for the core SPR0901-04BC (34.2816°N, 120.0415°W, 588 m water depth). SPR0901-04BC was collected by the research vessel R/V Sproul off Southern California in 2009.1. This research is funded by NSF OCE-0752093.