Conference Proceedings

Space-efficient construction of optimal prefix codes

A Moffat, A Turpin, J Katajainen

Data Compression Conference Proceedings | Published : 1995

Abstract

Large static text databases can be effectively compressed by coupling a semi-static word-based model with canonical Huffman coding. The word-based model yields relatively good compression, and the use of canonical Huffman coding on long tokens results in fast decoding rates. However, the combination results in a large set of symbols and the consequent risk of codeword overflow. Methods are presented for the calculation of an optimal length-limited code for the TREC word distribution in under 8 Mb of memory, as well as the calculation of an unrestricted Huffman code in under 1 Mb of memory.

University of Melbourne Researchers