On the Cost of Extracting Proximity Features for Term-Dependency Models

X Lu; A Moffat; JS Culpepper

Conference Proceedings

On the Cost of Extracting Proximity Features for Term-Dependency Models

X Lu, A Moffat, JS Culpepper, J Bailey (ed.), A Moffat (ed.), CC Aggarwal (ed.), MD Rijke (ed.), R Kumar (ed.), V Murdock (ed.), T Sellis (ed.), JX Yu (ed.)

Proc. 24th ACM CIKM Int. Conf. on Information and Knowledge Management | ACM | Published : 2015

DOI: 10.1145/2806416.2806467

Download PDF

Abstract

Sophisticated ranking mechanisms make use of term dependency features in order to compute similarity scores for documents. These features often include exact phrase occurrences, and term proximity estimates. Both cases build on the intuition that if multiple query terms appear near each other, the document is more likely to be relevant to the query. In this paper we examine the processes used to compute these statistics. Two distinct input structures can be used - inverted files and direct files. Inverted files must store the position offsets of the terms, while "direct" files represent each document as a sequence of preprocessed term identifiers. Based on these two input modalities, a numbe..

View full abstract