Conference Proceedings
On Identifying Phrases Using Collection Statistics
S Gog, A Moffat, M Petri, A Hanbury (ed.), G Kazai (ed.), A Rauber (ed.), N Fuhr (ed.)
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Springer International Publishing | Published : 2015
Abstract
The use of phrases as part of similarity computations can enhance search effectiveness. But the gain comes at a cost, either in terms of index size, if all word-tuples are treated as queryable objects; or in terms of processing time, if postings lists for phrases are constructed at query time. There is also a lack of clarity as to which phrases are “interesting”, in the sense of capturing useful information. Here we explore several techniques for recognizing phrases using statistics of large-scale collections, and evaluate their quality.