shcodec is order-0 32-bit canonical static huffman codec. It encodes alphabet of 256 symbols with minimum-redundancy or length-restricted codes(basic method: Alistair Moffat and Jyrki Katajainen, modified by Artur A. Pessoa). shcodec uses efficient method for tree packing: on text files packed tree size is approx 68 bytes, on binary files this value is about 132 bytes. Memory requirements are very small: 1280 bytes for encoding and only 574 bytes for decoding! shcodec uses extremely fast and simple SHIFT-OR method for encoding, and CANONICAL-DECODE with a cache for small codewords for decoding.
shcodec package contain all routines, needed to implement fast and efficient static huffman codec. All necessary preprocessing steps are described in the test program. In order to keep high operating speed, encode/decode routines are implemented as macro insead of functions: they encode/decode one symbol at time to/from the specified buffer The way how encode/decode loops must be organized is also shown in the test program.
Next table presents experimental results for shcodec version 1.0.1 and range coder version 1.3 on the calgary corpus set. In this table we use the following notation:
file |
file to encode |
file size |
original file size in bytes |
tree |
packed tree size in bytes (tree is included into the shc size) |
shc/rng size |
encoded file size in bytes (with shcodec/range coder) |
shc/rng bpb |
bpb value (for shcodec/range coder) |
ent bpb |
order-0 entropy bpb value |
With (#) sign we denote files where range coder is more efficient than shcodec(only 2 files from 18).
file |
file size |
tree |
shc size |
rng size |
shc bpb |
rng bpb |
ent bpb |
bib |
111,261 |
68 |
72,837 |
73,421 |
5,237 |
5,279 |
5,201 |
book1 |
768,771 |
68 |
438,449 |
441,851 |
4,563 |
4,598 |
4,527 |
book2 |
610,856 |
68 |
368,373 |
370,656 |
4,824 |
4,854 |
4,793 |
geo |
102,400 |
132 |
72,693 |
73,078 |
5,679 |
5,709 |
5,646 |
news |
377,109 |
56 |
246,457 |
248,417 |
5,228 |
5,270 |
5,190 |
obj1 |
21,504 |
132 |
16,189 |
16,493 |
6,023 |
6,136 |
5,948 |
obj2 |
246,814 |
132 |
194,233 |
189,329 |
6,296 |
6,137# |
6,260 |
paper1 |
53,161 |
68 |
33,413 |
33,419 |
5,028 |
5,029 |
4,983 |
paper2 |
82,199 |
68 |
47,689 |
48,022 |
4,641 |
4,674 |
4,601 |
paper3 |
46,526 |
56 |
27,337 |
27,645 |
4,701 |
4,753 |
4,665 |
paper4 |
13,286 |
52 |
7,917 |
8,047 |
4,767 |
4,845 |
4,700 |
paper5 |
11,954 |
56 |
7,493 |
7,698 |
5,015 |
5,152 |
4,936 |
paper6 |
38,105 |
68 |
24,097 |
24,225 |
5,059 |
5,086 |
5,010 |
pic |
513,216 |
140 |
106,697 |
79,391 |
1,663 |
1,238# |
1,210 |
progc |
39,611 |
56 |
25,977 |
26,200 |
5,246 |
5,291 |
5,199 |
progl |
71,646 |
56 |
43,045 |
43,215 |
4,806 |
4,825 |
4,770 |
progp |
49,379 |
68 |
30,289 |
30,677 |
4,907 |
4,970 |
4,869 |
trans |
93,695 |
76 |
65,301 |
65,586 |
5,576 |
5,600 |
5,533 |
average |
|
79 |
|
|
4,959 |
4,969 |
4,891 |
As you can see, on most files(16 from 18), shcodec outperformes range coder a little(on 0,201% in average). On very redundant files, such as pic, and inhomogenious ones, adaptive entropy coders are usually better.
So, shcodec does`t suits for high order models and not so efficient, when symbol distribution is very skewed(e.g. a lot of zeros in pic), but in practice, under order-0 model, it shows compression performance comparable with best known entropy coders and runs very fast.
This software may be used freely for any purpose. However, when distributed, the original source must be clearly stated, and, when the source code is distributed, the copyright notice must be retained and any alterations in the code must be clearly marked. No warranty is given regarding the quality of this software.
shc101.zip(~45 Kbytes) - shcodec source code, win32 executable and documentation.
shclib-1.0.1.tar.gz (~5 Kbytes) - shcodec library.
shcdll.zip (~47 Kbytes) - windows dll version of the shcodec library.
sh-sfx.zip(~28 Kbytes) - utility for creating SFX archives from shcodec files under win32.
HuffmanCode.zip(~29 Kbytes) - My article about Huffman Code (in Russian). Also in html.
If you have a question about using shcodec, you can reach me via email: xander at entropyware dot info
See also: EPSILON - powerful Open Source wavelet image compressor
|