Conference Proceedings

Phrase-based pattern matching in compressed text

J Shane Culpepper, Alistair Moffat, F Crestani (ed.), P Ferragina (ed.), M Sanderson (ed.)

STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS | SPRINGER-VERLAG BERLIN | Published : 2006

Abstract

Byte codes are a practical alternative to the traditional bit-oriented compression approaches when large alphabets are being used, and trade away a small amount of compression effectiveness for a relatively large gain in decoding efficiency. Byte codes also have the advantage of being searchable using standard string matching techniques. Here we describe methods for searching in byte-coded compressed text and investigate the impact of large alphabets on traditional string matching techniques. We also describe techniques for phrase-based searching in a restricted type of byte code, and present experimental results that compare our adapted methods with previous approaches. © Springer-Verlag Be..

View full abstract