Work Description

Title: Multi-Sense embeddings through a word sense disambiguation process Open Access Deposited

http://creativecommons.org/licenses/by/4.0/
Attribute Value
Methodology
Description
  • This data set is a collection of word similarity benchmarks (RG65, MEN3K, Wordsim 353, simlex999, SCWS, yp130, simverb3500) in their original format and converted into a cosine similarity scale. In addition, we have two Wikpedia Dumps from 2010 (April) and 2018 (January) in which we provide the original format (raw words), converted using the techniques described in the paper (MSSA, MSSA-D and MSSA-NR) (title in this repository), and also the word embeddings models for 300d and 1000d using a word2vec implementation. A readme.txt is provided with more details for each file.
Creator
Depositor
  • truas@umich.edu
Contact information
Discipline
Keyword
Date coverage
  • 2010-04-08
Citations to related material
Resource type
Last modified
  • 07/16/2019
Language
DOI
  • doi:10.7302/96kr-q988
CC License

Relationships

Files (Count: 4; Size: 61.3 GB)

Analytics

Total work file size of 61.3 GB is too large to download directly. Consider using Globus (see below).

Files are ready   Link to Globus download directory
Globus is for large data sets.   What is Globus?