Conference Proceedings

Extracting the Unextractable: A Case Study on Verb-particles

T Baldwin, A Villavicencio

Proceedings of the Annual Meeting of the Association for Computational Linguistics | Association for Computational Linguistics | Published : 2002

Abstract

This paper proposes a series of techniques for extracting English verb-particle constructions from raw text corpora. We initially propose three basic methods, based on tagger output, chunker output and a chunk grammar, respectively, with the chunk grammar method optionally combining with an attachment resolution module to determine the syntactic structure of verb-preposition pairs in ambiguous constructs. We then combine the three methods together into a single classifier, and add in a number of extra lexical and frequentistic features, producing a final F-score of 0.865 over the WSJ.

University of Melbourne Researchers

Citation metrics