This is classic implementation of LZW algorithm based on Mark Nelson's book on data compression from 1995. It uses variable-width dictionary codes up to 16-bit. Original development was done in 1996-97. It has been modified to be a single executable performing both compression and decompression, taking maximum bit length as argument for Compress2 function. The program has been tested on Ubuntu 18.04 (with gcc and clang) and on Windows 10 (with Visual Studio 2019).
Running make produces (1) executable `lzw16`; (2) static library `liblzw16.a` with three exported functions called Compress, Compress2 and Decompress (see export.h) and (3) test program `lzw_test` statically linked with the above library. Type `./lzw16` to see all command line options. Examples: `./lzw16 -p sample.txt sample.lzw` (pack sample.txt) `./lzw16 -u sample.lzw sample_copy.txt` (unpack sample.lzw) `./lzw16 -tv sample.txt` (test compression/decompression, verbose mode) `./lzw16 -large` (test synthetic data) `./lzw16 -b14 -large 10` (test synthetic data size 10 x 256 Kb, use max 14-bit) './lzw16 -pv -b12 sample.txt sample.lzw` (pack sample.txt using codes up to 12-bit width, verbose)
1. It supports input files up to 2GB in size. 2. It is currently supported on little-endian machines only.
1. Add big endian support. 2. Input file size limit can be fixed by replacing 4-byte file size header in compressed file with 48 or 64-bit values. 3. BUFFLEN define (see common.h) which is shared between Compress and Decompress calls can be included in compressed output file header and then set dynamically by Decompress. 4. Add functions taking binary buffers as opposed to filenames. //////////////////////////////////////////////////////////////////////////////// The code can be relatively easily converted to support 17-bit and even larger maximum width bit encoding. For this, some variables in Compress must be made 32-bit, and Decompress' NOT_CODE value must be more than 16-bits, along with stack/suffix/prefix arrays. At the same time, as maximum code width grows, we hit the "law of diminishing returns" on data compression while also increasing memory requirements, so for most practical purposes it's not really worth doing. Original versions written in 1996 had maximum code width of 12-13 bits also in part due to memory constraints of DOS and early Windows; 15-bit width maximum on modern computers gives more than adequate compression for LZW while taking advantage of much larger available memory.