Conference Proceedings

Compact set representation for information retrieval

J Shane Culpepper, Alistair Moffat, N Ziviani (ed.), R BaezaYates (ed.)

STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS | SPRINGER-VERLAG BERLIN | Published : 2007

Abstract

Conjunctive Boolean queries are a fundamental operation in web search engines. These queries can be reduced to the problem of intersecting ordered sets of integers, where each set represents the documents containing one of the query terms. But there is tension between the desire to store the lists effectively, in a compressed form, and the desire to carry out intersection operations efficiently, using non-sequential processing modes. In this paper we evaluate intersection algorithms on compressed sets, comparing them to the best non-sequential array-based intersection algorithms. By adding a simple, low-cost, auxiliary index, we show that compressed storage need not hinder efficient and high..

View full abstract