Zstandard (zstd) Condense Report

Overview

Zstandard (zstd) is a fast lossless compression algorithm targeting real-time compression scenarios at zlib-level and better compression ratios. It is backed by a very fast entropy stage provided by the Huff0 and FSE library (Finite State Entropy).

Zstandard's format is stable and documented in RFC 8478. The reference implementation is an open-source C library and CLI. Deployed across Meta and many other large cloud infrastructures. Continuously fuzzed by Google's oss-fuzz.

26.7k
GitHub Stars

2.4k
Forks

380
Contributors

76%
Written in C

Benchmarks

Tested on Core i7-9700K @ 4.9GHz, Ubuntu 24.04, gcc 14.2.0, Silesia compression corpus:

Compressor	Ratio	Compression	Decompression
zstd 1.5.7 -1	2.896	510 MB/s	1550 MB/s
brotli 1.1.0 -1	2.883	290 MB/s	425 MB/s
zlib 1.3.1 -1	2.743	105 MB/s	390 MB/s
zstd 1.5.7 --fast=1	2.439	545 MB/s	1850 MB/s
quicklz 1.5.0 -1	2.238	520 MB/s	750 MB/s
zstd 1.5.7 --fast=4	2.146	665 MB/s	2050 MB/s
lzo1x 2.10 -1	2.106	650 MB/s	780 MB/s
lz4 1.10.0	2.101	675 MB/s	3850 MB/s
snappy 1.2.1	2.089	520 MB/s	1500 MB/s
lzf 3.6 -1	2.077	410 MB/s	820 MB/s

Key insight: Negative compression levels (--fast=#) offer faster speed at the cost of ratio. Higher positive levels offer stronger ratios at the cost of compression speed. Decompression speed remains roughly constant across all settings — this is critical for the use case of reloading condensed repos into AI tools.

Small Data Compression (Dictionary Training)

Compression algorithms learn from past data to compress future data. At the beginning of a new data set, there is no "past" to build upon — making small data inherently harder to compress.

Zstd solves this with training mode: provide sample files to generate a "dictionary" that is loaded before compression/decompression.

Dictionary How-To

Create dictionary: zstd --train FullPathToTrainingSet/* -o dictionaryName
Compress: zstd -D dictionaryName FILE
Decompress: zstd -D dictionaryName --decompress FILE.zst

Using the github-users sample set (~10K records, ~1KB each), dictionary-based compression achieves dramatically better ratios with faster speeds. For structured KV data with repetitive key patterns, dictionary compression can yield 3–4x improvement over non-dictionary ZSTD on small blocks.

Build Systems

System	Command
make (reference)	`make` in root directory
cmake	`cmake -S . -B build-cmake && cmake --build build-cmake`
meson	See `build/meson`
vcpkg	`vcpkg install zstd`
conan	`conan install --requires="zstd/[*]" --build=missing`
buck	`buck build programs:zstd`
bazel	Via Bazel Central Repository
Visual Studio	Projects in `build/` dir or generate via cmake

macOS Universal2 (Fat) Build

cmake -S . -B build-cmake-debug -G Ninja -DCMAKE_OSX_ARCHITECTURES="x86_64;x86_64h;arm64"
cd build-cmake-debug
ninja
sudo ninja install

Testing

Quick smoke test: make check
Script-based: playTest.sh from src/tests (set $ZSTD_BIN and $DATAGEN_BIN)
CI details: see TESTING.md

Key Links

Homepage: www.zstd.net
RFC 8478: HTTP Content Compression — Zstandard Specification
Entropy library: github.com/Cyan4973/FiniteStateEntropy
Language bindings: Available for 30+ languages — see zstd homepage
Contributing: dev branch is the merge target; direct commits to release not permitted