Conference Proceedings

CSA : Fast pattern search for large alphabets

S Gog, A Moffat, M Petri

2017 Proceedings of the Ninteenth Workshop on Algorithm Engineering and Experiments (ALENEX) | Society for Industrial and Applied Mathematics | Published : 2017


Indexed pattern search in text has been studied for many decades. For small alphabets, the FM-Index provides unmatched performance for Count operations, in terms of both space required and search speed. For large alphabets - for example, when the tokens are words - the situation is more complex, and FM-Index representations are compact, but potentially slow. In this paper we apply recent innovations from the field of inverted indexing and document retrieval to compressed pattern search, including for alphabets into the millions. Commencing with the practical compressed suffix array structure developed by Sadakane, we show that the Elias-Fano code-based approach to document indexing can be ad..

View full abstract