Conference Proceedings

Blended Dictionaries for Reduced-Memory Lempel-Ziv Corpus Compression

J Tong, AI WIRTH, J Zobel, JS Culpepper (ed.), L Park (ed.), G Zuccon (ed.)

ACM Press | Published : 2014

Abstract

Relative Lempel-Ziv (RLZ) compression has been shown to be effective for compression of large text repositories. It provides high compression ratios with extremely fast atomic decompression of individual documents. However, it depends on a large in-memory dictionary, which is implemented as a contiguous string that must be accessed randomly during the decompression process. In this paper we explore how compressed suffix arrays might reduce the size of the dictionary. These suffix arrays drastically increase the cost of accessing individual characters, however, so we propose splitting of the dictionary: An uncompressed structure for frequently accessed dictionary elements, with compression fo..

View full abstract