Huffman coding is used to compress the literals section. Zstandard's Huffman implementation supports up to 255 symbols (byte values) and prefix codes of up to 11 bits.
4.2.1. Huffman Tree Description
The Huffman tree is described by the set of weights (code lengths) for each symbol. Only symbols with nonzero weight are included. Symbols with higher frequencies receive shorter codes.
4.2.1.1. Huffman Tree Header
The header byte determines how the tree weights are encoded:
- If the header byte value is
>= 128: the weights for all symbols follow in direct 4-bit-per-symbol encoding. The number of symbols is header_byte - 127.
- If the header byte value is
< 128: the header byte encodes the compressed size of an FSE-encoded weight table. The weights are encoded using FSE compression.
4.2.1.2. FSE Compression of Huffman Weights
When weights are FSE-compressed, they use a specialized FSE table with Accuracy_Log in the range [5, 6]. The FSE table for weights is itself encoded using an interleaved representation at the beginning of the compressed weights data.
4.2.1.3. Conversion from Weights to Huffman Prefix Codes
Given the weight array Weight[symbol] for each symbol, the Huffman prefix code for each symbol is computed using the standard algorithm: assign shorter codes to symbols with higher weights (lower code lengths). Symbols with Weight = 0 are not used and have no code.
4.2.2. Huffman-Coded Streams
Literals are encoded using the Huffman tree, with the bitstream written in little-endian bit order. The literals section may contain 1 or 4 Huffman-coded streams. When 4 streams are used, they are decoded in parallel (4 independent streams with pre-computed offsets), allowing SIMD acceleration on modern CPUs — this is a key source of Zstandard's high decompression throughput.