Welcome to shcodec home page

shcodec
shcodec is order-0 32-bit canonical static huffman codec. It encodes alphabet of 256 symbols with minimum-redundancy or length-restricted codes(basic method: Alistair Moffat and Jyrki Katajainen, modified by Artur A. Pessoa). shcodec uses efficient method for tree packing: on text files packed tree size is approx 68 bytes, on binary files this value is about 132 bytes. Memory requirements are very small: 1280 bytes for encoding and only 574 bytes for decoding! shcodec uses extremely fast and simple SHIFT-OR method for encoding, and CANONICAL-DECODE with a cache for small codewords for decoding.

Developer
shcodec package contain all routines, needed to implement fast and efficient static huffman codec. All necessary preprocessing steps are described in the test program. In order to keep high operating speed, encode/decode routines are implemented as macro insead of functions: they encode/decode one symbol at time to/from the specified buffer The way how encode/decode loops must be organized is also shown in the test program.

Experimental results
Next table presents experimental results for shcodec version 1.0.1 and range coder version 1.3 on the calgary corpus set. In this table we use the following notation:

filefile to encode
file sizeoriginal file size in bytes
treepacked tree size in bytes (tree is included into the shc size)
shc/rng sizeencoded file size in bytes (with shcodec/range coder)
shc/rng bpbbpb value (for shcodec/range coder)
ent bpb order-0 entropy bpb value

With (#) sign we denote files where range coder is more efficient than shcodec(only 2 files from 18).

filefile sizetreeshc sizerng sizeshc bpbrng bpbent bpb
bib111,2616872,83773,4215,2375,2795,201
book1768,77168438,449441,8514,5634,5984,527
book2610,85668368,373370,6564,8244,8544,793
geo102,40013272,69373,0785,6795,7095,646
news377,10956246,457248,4175,2285,2705,190
obj121,50413216,18916,4936,0236,1365,948
obj2246,814132194,233189,3296,2966,137#6,260
paper153,1616833,41333,4195,0285,0294,983
paper282,1996847,68948,0224,6414,6744,601
paper346,5265627,33727,6454,7014,7534,665
paper413,286527,9178,0474,7674,8454,700
paper511,954567,4937,6985,0155,1524,936
paper638,1056824,09724,2255,0595,0865,010
pic513,216140106,69779,3911,6631,238#1,210
progc39,6115625,97726,2005,2465,2915,199
progl71,6465643,04543,2154,8064,8254,770
progp49,3796830,28930,6774,9074,9704,869
trans93,6957665,30165,5865,5765,6005,533
average794,9594,9694,891

As you can see, on most files(16 from 18), shcodec outperformes range coder a little(on 0,201% in average). On very redundant files, such as pic, and inhomogenious ones, adaptive entropy coders are usually better.

So, shcodec does`t suits for high order models and not so efficient, when symbol distribution is very skewed(e.g. a lot of zeros in pic), but in practice, under order-0 model, it shows compression performance comparable with best known entropy coders and runs very fast.

License
This software may be used freely for any purpose. However, when distributed, the original source must be clearly stated, and, when the source code is distributed, the copyright notice must be retained and any alterations in the code must be clearly marked. No warranty is given regarding the quality of this software.

Download
shc101.zip(~45 Kbytes) - shcodec source code, win32 executable and documentation.
shclib-1.0.1.tar.gz (~5 Kbytes) - shcodec library.
shcdll.zip (~47 Kbytes) - windows dll version of the shcodec library.
sh-sfx.zip(~28 Kbytes) - utility for creating SFX archives from shcodec files under win32.
HuffmanCode.zip(~29 Kbytes) - My article about Huffman Code (in Russian). Also in html.

Support
If you have a question about using shcodec, you can reach me via email: xander at entropyware dot info

See also: EPSILON - powerful Open Source wavelet image compressor


Copyright (C) 1998-2009 Simakov Alexander
Valid HTML 4.01!