Index Catalog // Deep Blue Data

Need language around using Deep Blue Data as a repository for your NIH data sharing plan? See Data Sharing Boilerplate

Start Over

Filtering by: Creator Aizawa, Akiko Discipline Other

Multi-Sense embeddings through a word sense disambiguation process

Creator:

Ruas, Terry, Grosky, William, and Aizawa, Akiko

Description:

This data set is a collection of word similarity benchmarks (RG65, MEN3K, Wordsim 353, simlex999, SCWS, yp130, simverb3500) in their original format and converted into a cosine similarity scale. In addition, we have two Wikpedia Dumps from 2010 (April) and 2018 (January) in which we provide the original format (raw words), converted using the techniques described in the paper (MSSA, MSSA-D and MSSA-NR) (title in this repository), and also the word embeddings models for 300d and 1000d using a word2vec implementation. A readme.txt is provided with more details for each file.

Keyword:

multi-sense embeddings, MSSA, word2vec, wikipedia dump, synset, and natural language processing

Citation to related publication:

https://doi.org/10.1016/j.eswa.2019.06.026

Discipline:

Other