Conference Proceedings

The company they keep: Extracting Japanese neologisms using language patterns

J Breen, T Baldwin, F Bond

GWC 2018 - 9th Global WordNet Conference | Association for Computational Linguistics | Published : 2018

Abstract

© 2018 Global WordNet Association. All rights reserved. We describe an investigation into the identification and extraction of unrecorded potential lexical items in Japanese text by detecting text passages containing selected language patterns typically associated with such items. We identified a set of suitable patterns, then tested them with two large collections of text drawn from the WWW and Twitter. Samples of the extracted items were evaluated, and it was demonstrated that the approach has considerable potential for identifying terms for later lexicographic analysis.