Evalution 1.0: an evolving semantic dataset for training and evaluation of distributional semantic models

Published in Workshop on Linked Data in Linguistics @ ACL, 2015

Enrico Santus, Frances Yung, Alessandro Lenci, Chu-Ren Huang

Download paper here

In this paper, we introduce EVALution 1.0, a dataset designed for the training and the evaluation of Distributional Semantic Models (DSMs). This version consists of almost 7.5K tuples, instantiating several semantic relations between word pairs (including hypernymy, synonymy, antonymy, meronymy). The dataset is enriched with a large amount of additional information (i.e. relation domain, word frequency, word POS, word semantic field, etc.) that can be used for either filtering the pairs or performing an in-depth analysis of the results. The tuples were extracted from a combination of ConceptNet 5.0 and WordNet 4.0, and subsequently filtered through automatic methods and crowdsourcing in order to ensure their quality. The dataset is freely downloadable1. An extension in RDFformat, including also scripts for data processing, is under development.