LexSemTm: A Semantic Dataset Based on All-words Unsupervised Sense Distribution Learning

A Bennett; T Baldwin; J Lau; D McCarthy; F Bond

Conference Proceedings

LexSemTm: A Semantic Dataset Based on All-words Unsupervised Sense Distribution Learning

A Bennett, T Baldwin, J Lau, D McCarthy, F Bond

The Association for Computational Linguistics | Published : 2016

DOI: 10.18653/v1/p16-1143

Abstract

There has recently been a lot of interest in unsupervised methods for learning sense distributions, particularly in applications where sense distinctions are needed. This paper analyses a state-of-the-art method for sense distribution learning, and optimises it for application to the entire vocabulary of a given language. The optimised method is then used to produce LexSemTM: a sense frequency and semantic dataset of unprecedented size, spanning approximately 88% of polysemous, English simplex lemmas, which is released as a public resource to the community. Finally, the quality of this data is investigated, and the LexSemTM sense distributions are shown to be superior to those based on the W..

View full abstract