Conference Proceedings

Unimelb: Topic modelling-based word sense induction for web snippet clustering

JH Lau, P Cook, T Baldwin

*SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics | Published : 2013

Abstract

c 2013 Association for Computational Linguistics This paper describes our system for Task 11 of SemEval-2013. In the task, participants are provided with a set of ambiguous search queries and the snippets returned by a search engine, and are asked to associate senses with the snippets. The snippets are then clustered using the sense assignments and systems are evaluated based on the quality of the snippet clusters. Our system adopts a preexisting Word Sense Induction (WSI) methodology based on Hierarchical Dirichlet Process (HDP), a non-parametric topic model. Our system is trained over extracts from the full text of English Wikipedia, and is shown to perform well in the shared task.

Citation metrics