SQL & KV Data Compression — Integrity Studio

Research Questions

How much do modern SQL databases compress data internally? What additional benefit comes from column-level codec selection? What about KV stores — how do RocksDB, LMDB, and Cloudflare KV handle compression, and which algorithms work best for different access patterns?

1. SQL Database Compression

1.1 PostgreSQL: TOAST Compression (pglz vs LZ4)

PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique) to compress and out-of-line store values exceeding roughly 2 KB. Two algorithms are available since PostgreSQL 14:

Metric	pglz (default)	LZ4
Compression ratio	~2.23x	~2.07x
Compression speed	Baseline	5x faster (~20% of pglz time)
Compression speed under parallel load	Baseline	~60–70% faster (parallel compression; SELECT impact is ~20% — see "Query time impact")
Query time impact	Baseline	20% faster queries
Full benchmark speedup	—	37.32% faster end-to-end
CPU overhead	2x that of LZ4	Baseline

Critical finding under parallel load: pglz performance degrades and becomes worse than uncompressed data as parallel queries approach available CPU core count. LZ4 maintains consistent performance above uncompressed baselines even under saturation. For OLTP workloads, LZ4 is the clear default choice.

Configuration: ALTER TABLE t ALTER COLUMN c SET COMPRESSION lz4; or globally via default_toast_compression = lz4

Columnar Extensions: Citus and Hydra

Citus Columnar: 3–10x compression ratios; benchmark data shows 5.4x for a typical mixed-type table vs heap storage.
Hydra Columnar: Open-source; in ClickBench comparisons, Hydra outperformed Citus, TimescaleDB, Aurora, and standard PostgreSQL on the 42-query benchmark.
TimescaleDB Hypercore: Achieves up to 95% compression on time-series data using dictionary compression for high-repetition JSONB columns.

1.2 MySQL / InnoDB: Page Compression and Table Compression

Method	Algorithm	Compression Ratio	Performance Overhead
Barracuda Table Compression (ROW_FORMAT=COMPRESSED)	zlib	50–75% size reduction	~60% for write-heavy workloads
Transparent Page Compression (InnoDB page)	zlib or LZ4	Variable	10–17% (dramatically lower)

Guidance: For new deployments, transparent page compression with LZ4 provides the best latency-to-savings tradeoff. Reserve Barracuda table compression for cold/archival tables where write overhead is acceptable.

1.3 ClickHouse: Per-Column Codec Selection

ClickHouse provides the most granular compression control of any production database. Each column can be assigned a pipeline of pre-compression codecs followed by a general-purpose algorithm.

Codec	Best For	Mechanism
Delta	Monotonic sequences, timestamps	Stores difference between consecutive values
DoubleDelta	Periodic timestamps, increasing counters	Delta-of-deltas; exceptional for constant-interval series
Gorilla	Slowly changing floats, gauge metrics	XOR between consecutive values
T64	Sparse integers, values small relative to type range	64x64 bit matrix transpose + bit truncation
FPC	Float64 columns	Two-prediction finite context method
LZ4	General purpose, speed-critical	Default; fast compression and decompression
ZSTD(level)	General purpose, ratio-critical	30% better ratio than LZ4; higher CPU

Real-World Benchmarks (NOAA Weather Dataset)

Schema Stage	Compressed Size	Uncompressed Size	Ratio
Naive (all String)	4.07 GiB	131.58 GiB	32.36:1
Type-optimized	3.83 GiB	71.38 GiB	18.63:1
String/type refined	3.81 GiB	35.34 GiB	9.28:1
Optimal codecs	1.42 GiB	36.05 GiB	25.36:1

Moving from naive types to optimal codecs reduced compressed storage by 65% (4.07 GiB to 1.42 GiB). Standout: Date column with DoubleDelta reduced 2.24 GiB to 24.28 MiB (~99% reduction).

1.4 SQLite: Page-Level Compression Options

SQLite has no built-in compression. External strategies fill the gap:

Solution	Mechanism	Size Reduction	Query Impact
ZIPVFS (official, commercial)	Per-page compression	Variable	Low overhead
sqlite-zstd (Rust extension)	Row-level zstd with dictionary	75% (442 MB to 67 MB typical)	Sequential: -22%, Random: +34% faster
sqlite_zstd_vfs	Page-level VFS streaming	45–65%	-25% query throughput
ZFS/Btrfs filesystem	Transparent block-level	LZ4: ~67%, zstd: up to 90%	Minimal to none

1.5 Cross-Engine Compression Ratio Summary

Engine	Strategy	Typical Ratio	Notes
PostgreSQL (heap)	TOAST LZ4	2–2.2x	Row-level only
PostgreSQL (columnar)	Citus/Hydra	3–10x	Analytical tables
MySQL InnoDB	Transparent page (LZ4)	2–4x	10–17% overhead
ClickHouse	LZ4 default	5–15x	Column-oriented
ClickHouse	Optimal codecs + ZSTD	15–50x	Per-column tuning
SQLite	sqlite-zstd + dictionary	4–6x	Extension required
TimescaleDB	Hypercore columnar	10–20x	Time-series optimized

2. KV Store Compression

2.1 RocksDB: Per-Level Compression Strategy

RocksDB (LSM-tree) provides the most sophisticated KV compression architecture, with per-level algorithm selection.

Facebook/Meta Production Configuration

compression_per_level = kNoCompression      # L0
                       kNoCompression      # L1
                       kLZ4Compression     # L2
                       kLZ4Compression     # L3
                       kLZ4Compression     # L4
                       kZSTD               # L5
                       kZSTD               # L6
bottommost_compression = kZSTD

Rationale: L0 and L1 are frequently compacted; compression adds CPU load with minimal space benefit. The bottommost level holds the vast majority of data — ZSTD's higher ratio pays off there.

Algorithm	Compression Speed	Decompression Speed	Ratio	Recommendation
None	N/A	N/A	1.0x	L0–L1 (hot levels)
LZ4	~3.5 GB/s	~3.5 GB/s	~1.12x	Middle levels
LZ4HC	Slower than LZ4	Same as LZ4	~1.3x	Read-heavy middle levels
ZSTD-1	~1 GB/s	~1 GB/s	~1.5–2x	Good speed/ratio balance
ZSTD-3 (default)	~800 MB/s	~1 GB/s	~2–3x	Bottommost level default
ZSTD-19	~50 MB/s	~1 GB/s	~3–4x	Cold/archival only

2.2 LMDB: No Built-In Compression

LMDB deliberately excludes compression from its B+tree engine. External compression strategies:

Strategy	Implementation	Performance
Application-level LZ4	Compress values before `mdb_put()`	Best read speed
Application-level Snappy	Compress values before `mdb_put()`	Best write speed, best space savings
Filesystem-level (ZFS/Btrfs)	Transparent to LMDB	Zero application changes

When LMDB is appropriate: Single-writer/many-reader workloads where zero-copy reads matter more than storage density. If you need built-in compression, use RocksDB instead.

2.3 Redis: Memory-Level Compression Encodings

Redis compresses at the data-structure level rather than the page or block level:

Encoding	Data Structure	Memory Savings	Threshold
ziplist (< Redis 7)	Hashes, sorted sets, lists	5–10x vs pointer-based	hash-max-ziplist-entries: 512
listpack (Redis 7+)	Replacement for ziplist	Similar to ziplist, safer	hash-max-listpack-entries: 128
quicklist	Lists	Ziplist nodes in doubly-linked list	list-max-listpack-size: -2 (8 KB)
intset	Sets of integers	Compact integer array	set-max-intset-entries: 512

Application-level compression benchmarks (LogicMonitor study):

LZ4 on values: 60% memory savings (39,739 bytes to 16,208 bytes)
Float16 + Blosc + ZLib + Base64 pipeline: 85% savings (80,000 bytes to 11,383 bytes)

2.4 Cloudflare Workers KV: Edge Compression

Cloudflare Workers KV is an eventually-consistent global KV store optimized for read-heavy, write-infrequent workloads.

2025 redesign: Hybrid storage — distributed database for small values, R2 object storage for large values
p99 read latency improved from 200 ms to under 5 ms after the 2025 rearchitecture
Internal compression is opaque; Cloudflare handles storage optimization at the infrastructure level

Best practice: Pre-compress with CompressionStream (Web Streams API) in the Worker itself, using gzip (broadest decompression support) or raw deflate.

2.5 DynamoDB: Client-Side Attribute Compression

DynamoDB provides no server-side compression. All compression is application-layer.

Algorithm	Best For	WCU/RCU Savings
GZIP	Text-heavy items	Up to 70% cost savings
LZO	Faster compression, lower ratio	40–50%
Snappy	Fast, moderate ratio	30–45%

Important constraint: Compressed attributes cannot be used in filter expressions, key conditions, or projection expressions meaningfully. Design partition/sort keys and GSI attributes to remain uncompressed.

2.6 etcd / bbolt: Compaction, Not Compression

etcd uses bbolt (B+tree store with 4 KB memory-mapped pages). bbolt has no compression capability.

What etcd calls "compaction" is revision pruning: dropping superseded MVCC revisions to reclaim logical space. bbolt does not release freed pages back to the OS — only etcd defrag physically reclaims disk space.

Practical guidance:

Run etcd compaction regularly to prune old revisions
Follow with etcd defrag to reclaim disk space
For storage-constrained environments, reduce --snapshot-count and enable auto-compaction

3. Column-Oriented vs Row-Oriented Compression Ratios

The fundamental difference: columns contain homogeneous data types, enabling specialized codecs that exploit type-specific patterns. Rows mix types, limiting compression to general-purpose algorithms.

ClickBench Comparison (100M rows)

System	Type	Storage Size	Relative
ClickHouse	Columnar	9.26 GiB	1x
TimescaleDB (compressed)	Hybrid columnar	~20 GiB	~2x
PostgreSQL (heap)	Row-oriented	~100 GiB	~10x

Why Columnar Compresses Better

Run-Length Encoding (RLE): Sorted columns with repeated values compress to (value, count) pairs.
Dictionary Encoding: Low-cardinality string columns replace strings with integer indices.
Delta Encoding: Monotonic sequences store only differences. If timestamps increment by exactly 1 second, each value is 0 bits after double-delta.
Bit-Packing / Frame-of-Reference: Integer values clustered in a narrow range are stored with minimal bits per value.
Type-Specific Codecs: Gorilla for floats, DoubleDelta for timestamps — impossible in row stores that see each row as an opaque tuple.

Quantified Advantage

Row-oriented databases (PostgreSQL, MySQL): 2–4x compression on typical OLTP data
Columnar databases (ClickHouse, DuckDB, Redshift): 10–50x compression on analytical data
Hybrid approaches (TimescaleDB Hypercore, Citus Columnar): 5–20x, bridging the gap

The 10x storage difference between ClickHouse and PostgreSQL on the same 100M-row dataset represents real production cost: at S3 pricing ($0.023/GB/month), that is the difference between $0.21/month and $2.30/month per dataset — scaling to thousands of dollars at enterprise data volumes.

4. Compression Codec Selection by Data Type

Data Type	Primary Codec	Secondary Codec	Expected Ratio	Notes
Timestamps (periodic)	DoubleDelta	+LZ4	64:1 to 800:1	Constant intervals compress to ~1 bit/value
Timestamps (irregular)	Delta	+ZSTD	4–10x	Variable deltas need general-purpose follow-up
Auto-increment integers	Delta	+LZ4	100–800:1	Monotonic sequences are ideal for delta
Gauge integers (metrics)	T64	+ZSTD	3–8x	T64 truncates unused high bits
Float64 (slowly changing)	Gorilla	+LZ4	2–8x	XOR encoding exploits bit-level similarity
Float64 (volatile)	None	ZSTD	1.5–3x	Gorilla provides no benefit on random floats
Low-cardinality strings	Dictionary encoding	+LZ4	5–20x	Replace strings with integer indices
JSON / JSONB	Dictionary (if repetitive)	ZSTD	3–10x	Key names repeat; dictionary training excels
UUIDs (binary 16-byte)	None	None	1x (already compact)	Store as UUID/BINARY(16), not TEXT
Booleans	Bit-packing / RLE	+LZ4	8–64x	1 bit vs 1 byte per value

Key Principles

Smallest type first: Use Int16 instead of Int64 when range permits.
Pre-compression codecs before general-purpose: Delta, DoubleDelta, Gorilla, and T64 transform data into more compressible representations before LZ4/ZSTD processes it.
Measure per column: Compression effectiveness varies dramatically across columns in the same table. Always benchmark with real data.
UUIDs are compression-hostile: Their uniform distribution defeats all algorithms. Store as binary (16 bytes) rather than text (36 bytes) for a guaranteed 2.25x savings without compression.

5. Dictionary Training on Schema-Specific Data

How ZSTD Dictionary Training Works

ZSTD dictionaries capture recurring byte patterns from a training corpus. During compression, each input block is encoded using references to dictionary fragments rather than raw literals. The dictionary is typically small (32–100 KiB) and is shared across all compressions of similar data.

Data Category	Without Dictionary	With Dictionary	Improvement
Small JSON objects (<1 KB)	1.5–2x	5–8x	3–4x better
Database pages (4–16 KB)	2–3x	4–6x	~2x better
SSTable blocks (Cassandra)	2–3x	6–10x	3–4x better
Log lines (structured)	2–4x	6–12x	3x better
Large blobs (>100 KB)	3–5x	3.5–5.5x	Marginal

Key insight: Dictionary gains are inversely proportional to input size. For data under 1 KB, dictionaries are transformational. For data over 100 KB, the input itself provides enough context that a dictionary adds little.

Dictionary Training Guidelines

Corpus size: 100x the target dictionary size minimum (e.g., 10 MB of samples for a 100 KB dictionary)
Dictionary size: 32–100 KiB (larger dictionaries offer diminishing returns)
Retraining frequency: When data distribution shifts significantly (schema changes, new value patterns)
Storage overhead: Dictionary must be available at decompression time; embed it alongside compressed data or store in metadata

Production Implementations

RocksDB: Supports ZSTD dictionary training at the bottommost level during compaction. Enable with CompressionOptions::zstd_max_train_bytes and CompressionOptions::max_dict_bytes.
Cassandra (CEP-54): Proposed per-table dictionary support for SSTable compression. Expected 3–4x improvement over non-dictionary ZSTD.
sqlite-zstd: Trains dictionaries per-column at the SQLite table level. Benchmark: 2.0 GB database compressed to 528 MB (75%) with dictionary vs 1.63 GB (23%) without — a 3.2x improvement from dictionary training alone.

6. Write Amplification Tradeoffs

In LSM-tree databases (RocksDB, LevelDB, Cassandra, CockroachDB), every logical write is physically written multiple times as data moves through compaction levels. Compression adds CPU cost to each of these writes.

Compression Level vs Write Throughput

Compression	Compaction CPU Cost	Bytes Written	Net Write Throughput	Best For
None	Lowest	Highest	Highest ingest rate	Burst ingestion, L0–L1
LZ4	Very low (+5–10%)	Moderate	Near-maximum	Hot levels, mixed workloads
ZSTD-1	Low (+15–20%)	Low	Good balance	Middle levels
ZSTD-3	Moderate (+30–40%)	Low	Good for read-heavy	Bottommost level default
ZSTD-9	High (+100–200%)	Very low	Write-limited	Archival, cold tiers
ZSTD-19	Very high (+500%+)	Lowest	Severely limited	Offline compaction only

Decision Framework

If write_throughput_critical AND ssd_storage:
    L0-L1: None, L2-L4: LZ4, Lmax: ZSTD-1
If storage_cost_critical AND write_moderate:
    L0: None, L1-L4: LZ4, Lmax: ZSTD-3
If archival AND write_infrequent:
    All levels: ZSTD-9 or ZSTD-19

7. Algorithm Showdown: ZSTD vs LZ4 vs Snappy vs PPMd

Metric	LZ4	Snappy	ZSTD-1	ZSTD-3	ZSTD-9	PPMd
Compression speed	~3.5 GB/s	~3.5 GB/s	~1 GB/s	~800 MB/s	~200 MB/s	~20 MB/s
Decompression speed	~3.5 GB/s	~3.5 GB/s	~1 GB/s	~1 GB/s	~1 GB/s	~20 MB/s
Ratio (structured data)	1.5–2.5x	1.4–2.2x	2–3x	2.5–4x	3–5x	4–8x
Ratio (text data)	2–3x	1.8–2.5x	3–5x	4–6x	5–8x	6–12x
Dictionary support	No	No	Yes	Yes	Yes	Yes

Cost-at-Scale Comparison (500 TB/month, S3 eu-central-1 at $0.0235/GB)

Algorithm	Effective Size	Monthly Cost	Savings vs None
None	500 TB	$11,750	—
LZ4	~333 TB (1.5x)	$7,833	$3,917/mo
ZSTD-3	~167 TB (3x)	$3,917	$7,833/mo
ZSTD-19	~125 TB (4x)	$2,938	$8,812/mo

ZSTD-3 delivers the sweet spot: 2x the savings of LZ4 with practical CPU overhead.

8. Practical Pipeline Recommendations

Use Case	Database	Compression Strategy	Expected Ratio	Priority
OLTP web app	PostgreSQL	TOAST LZ4	2x	Latency
Time-series metrics	ClickHouse	DoubleDelta+LZ4 (time), Gorilla+LZ4 (values)	15–50x	Storage + query speed
Log analytics	ClickHouse	Delta+ZSTD (timestamps), ZSTD (text)	10–30x	Storage
Embedded mobile DB	SQLite	sqlite-zstd with dictionary	4–6x	Storage + random read
KV cache (hot)	Redis	Listpack encoding + app-level LZ4	2–5x	Memory cost
KV persistent store	RocksDB	ZSTD with dictionary (bottom), LZ4 (mid)	3–6x	Storage
Global edge KV	Cloudflare KV	Client-side gzip/ZSTD pre-compression	2–4x	Egress cost
Serverless KV	DynamoDB	Client-side GZIP on large attributes	2–4x	WCU/RCU cost
Analytical warehouse	ClickHouse/DuckDB	Full codec optimization per column	15–50x	Query + storage

Decision Flowchart

START
  |
  v
Is latency the primary concern?
  |-- YES --> Use LZ4 everywhere. Measure. Stop.
  |-- NO
  v
Is this columnar / analytical data?
  |-- YES --> Use type-specific codecs (Delta, DoubleDelta, Gorilla, T64) + ZSTD
  |-- NO
  v
Is this an LSM-tree KV store?
  |-- YES --> No compression L0-L1, LZ4 L2-L4, ZSTD-3 Lmax
  |-- NO
  v
Is this a row-oriented RDBMS?
  |-- YES --> PostgreSQL: TOAST LZ4 | MySQL: Transparent page compression LZ4
  |-- NO
  v
Is this a managed KV service (DynamoDB, Workers KV)?
  |-- YES --> Client-side compression (GZIP for broadest compat, ZSTD for best ratio)
  |-- NO
  v
Default: ZSTD-3 at the application layer. Always measure.

9. Sources and Citations

PostgreSQL

ClickHouse

SQLite

sqlite-zstd: Transparent dictionary-based row-level compression (phiresky)

RocksDB

Redis

Cloudflare Workers KV

Cloudflare Rearchitects Workers KV, Achieves 40x Performance Gain (InfoQ, Aug 2025)

Appendix: Practical Database Compression Choices (2025–2026 Era)

What Changed Recently

ZSTD is now the de facto standard. In 2020, Snappy and zlib dominated. By 2026, ZSTD has replaced both in most production systems.
Dictionary compression matured. RocksDB's dictionary support is production-ready. sqlite-zstd demonstrates 3x improvement from dictionary training alone.
LZ4 won the speed tier. Snappy has no remaining advantage. PostgreSQL 14+ made LZ4 TOAST a first-class option.
Columnar compression in row databases. TimescaleDB Hypercore, Citus Columnar, and Hydra bring 10–20x compression to PostgreSQL without leaving the ecosystem.
Edge KV stores are opaque. Cloudflare Workers KV and similar edge stores handle compression internally. The optimization lever for users is client-side pre-compression and efficient serialization.

Quick-Reference Decision Table

Question	Answer	Action
PostgreSQL and need compression?	Use LZ4 TOAST	`SET default_toast_compression = lz4;`
PostgreSQL and need 10x+ compression?	Add columnar extension	Citus Columnar or TimescaleDB Hypercore for cold data
ClickHouse and want optimal storage?	Tune codecs per column	Profile with `DESCRIBE TABLE ... SETTINGS describe_compact_output=0`
RocksDB and need space savings?	Per-level compression	`compression_per_level` + `bottommost_compression=kZSTD`
SQLite and DB too large?	sqlite-zstd extension	Train dictionary per large text/JSON column
DynamoDB and high costs?	Client-side GZIP	Compress non-filterable attributes before PutItem
Need to choose one algorithm?	ZSTD-3	Best general-purpose balance of ratio, speed, and ecosystem support

The One Rule: Measure with your data. Compression ratios published in benchmarks are averages across specific datasets. Your schema, cardinality, value distributions, and access patterns will produce different numbers. Always benchmark candidate configurations against a representative sample of your production data before committing to a compression strategy.