Conference Proceedings
Extracting the Unextractable: A Case Study on Verb-particles
T Baldwin, A Villavicencio
Proceedings of the Annual Meeting of the Association for Computational Linguistics | Association for Computational Linguistics | Published : 2002
Abstract
This paper proposes a series of techniques for extracting English verb-particle constructions from raw text corpora. We initially propose three basic methods, based on tagger output, chunker output and a chunk grammar, respectively, with the chunk grammar method optionally combining with an attachment resolution module to determine the syntactic structure of verb-preposition pairs in ambiguous constructs. We then combine the three methods together into a single classifier, and add in a number of extra lexical and frequentistic features, producing a final F-score of 0.865 over the WSJ.