Access Time Tradeoffs in Archive Compression

M Petri; A Moffat; PC Nagesh; A Wirth

Conference Proceedings

Access Time Tradeoffs in Archive Compression

M Petri, A Moffat, PC Nagesh, A Wirth, G Zuccon (ed.), S Geva (ed.), H Joho (ed.), F Scholer (ed.), A Sun (ed.), P Zhang (ed.)

Proc. 11th Asian Information Retrieval Societies Conf. | Springer International Publishing | Published : 2015

DOI: 10.1007/978-3-319-28940-3_2

Download PDF

Abstract

Web archives, query and proxy logs, and so on, can all be very large and highly repetitive; and are accessed only sporadically and partially, rather than continually and holistically. This type of data is ideal for compression-based archiving, provided that random-access to small fragments of the original data can be achieved without needing to decompress everything. The recent RLZ (relative Lempel Ziv) compression approach uses a semi-static model extracted from the text to be compressed, together with a greedy factorization of the whole text encoded using static integer codes. Here we demonstrate more precisely than before the scenarios in which RLZ excels. We contrast RLZ with alternative..

View full abstract