Conference Proceedings

How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions

Navnita Nandakumar, Timothy Baldwin, Bahar Salehi

Association for Computational Linguistics | Published : 2019


In this paper, we apply various embedding methods on multiword expressions to study how well they capture the nuances of non-compositional data. Our results from a pool of word-, character-, and document-level embbedings suggest that Word2vec performs the best, followed by FastText and Infersent. Moreover, we find that recently-proposed contextualised embedding models such as Bert and ELMo are not adept at handling non-compositionality in multiword expressions.

Citation metrics