I wanted to share my implementation of a simple entropy coder, and to do this, I have created a simple lossy wavelet based data compressor. There are 3 basic steps in the compression algorithm:
- Wavelet transform the data
The wavelet transform code was taken from Parashar Krishnamachari on flipcode (http://www.flipcode.com/archives/1-D_Generalized_Wavelet_Class.shtml). This post is more about the entropy coder than wavelets, so I don’t go into detail here.
The basic idea is you perform a wavelet decomposition on your data samples to obtain your wavelet coefficients. The number of coefficients is the same as the number of samples.
The purpose of this is to reduce the correlation of the samples in your dataset, which gives you an exponential probability distribution around 0. This is a nice distribution to feed into an entropy coder.
- Quantize the wavelet coefficients
This is the lossy step in the compression algorithm – you have to scale your coefficients to fit into a given range (which alters your compression ratio), and convert them to integers.
The entropy coder expects positive integers only, so at the same time as quantization, I convert all of the samples to positive numbers with the formula:
q = c > 0 ? 2 * c : -2 * c + 1
- Entropy code the quantized coefficients
This is the part of the algorithm that actually does the compressing, and is a completely lossless process. The entropy coder I have implemented is a Run-length Pod Bitplane Encoder.
Bitplane encoders encode the dataset one “bitplane” at a time, starting at the most significant bit of each sample, and proceeding to the next significant bit once the whole bitplane has been encoded.
I chose to encode the run-length of 0 bits in the bitplanes using Pod numbers, as this is a pretty simple method and has given me good results compared to other methods I have tried.
Pod numbers (http://www.firstpr.com.au/audiocomp/lossless/) are similar to Elias Gamma codes or Rice/Golomb codes, which are a type of Arithmetic coding scheme that assigns less bits in the output stream to samples of small value, and more bits to samples of large value. This works perfectly with the exponential distribution around 0 that we have for the wavelet coefficients, as many of them will be close to or exactly 0.
The encoder could be more complex for a higher compression ratio, but this would cause its run-time performance to suffer.
The code has implementations for compressing 1d and 2d data sets. Performing a 2d wavelet transform can be done by a convolution of horizontal and vertical 1d transforms. https://www.dropbox.com/s/e92p4xg9kfqzo3h/WaveletCompression.zip?dl=1