SQL & KV Data Compression

A Practitioner's Guide — 2025–2026 Era
Research Paper | Scope: PostgreSQL, MySQL, ClickHouse, SQLite, RocksDB, Redis, Cloudflare KV, DynamoDB, etcd | Collection: Code Condensation Whitepaper

Research Questions

How much do modern SQL databases compress data internally? What additional benefit comes from column-level codec selection? What about KV stores — how do RocksDB, LMDB, and Cloudflare KV handle compression, and which algorithms work best for different access patterns?

1. SQL Database Compression

1.1 PostgreSQL: TOAST Compression (pglz vs LZ4)

PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique) to compress and out-of-line store values exceeding roughly 2 KB. Two algorithms are available since PostgreSQL 14:

Metric pglz (default) LZ4
Compression ratio ~2.23x ~2.07x
Compression speed Baseline 5x faster (~20% of pglz time)
Compression speed under parallel load Baseline ~60–70% faster (parallel compression; SELECT impact is ~20% — see "Query time impact")
Query time impact Baseline 20% faster queries
Full benchmark speedup 37.32% faster end-to-end
CPU overhead 2x that of LZ4 Baseline
Critical finding under parallel load: pglz performance degrades and becomes worse than uncompressed data as parallel queries approach available CPU core count. LZ4 maintains consistent performance above uncompressed baselines even under saturation. For OLTP workloads, LZ4 is the clear default choice.

Configuration: ALTER TABLE t ALTER COLUMN c SET COMPRESSION lz4; or globally via default_toast_compression = lz4

Columnar Extensions: Citus and Hydra

  • Citus Columnar: 3–10x compression ratios; benchmark data shows 5.4x for a typical mixed-type table vs heap storage.
  • Hydra Columnar: Open-source; in ClickBench comparisons, Hydra outperformed Citus, TimescaleDB, Aurora, and standard PostgreSQL on the 42-query benchmark.
  • TimescaleDB Hypercore: Achieves up to 95% compression on time-series data using dictionary compression for high-repetition JSONB columns.

1.2 MySQL / InnoDB: Page Compression and Table Compression

Method Algorithm Compression Ratio Performance Overhead
Barracuda Table Compression (ROW_FORMAT=COMPRESSED) zlib 50–75% size reduction ~60% for write-heavy workloads
Transparent Page Compression (InnoDB page) zlib or LZ4 Variable 10–17% (dramatically lower)

Guidance: For new deployments, transparent page compression with LZ4 provides the best latency-to-savings tradeoff. Reserve Barracuda table compression for cold/archival tables where write overhead is acceptable.

1.3 ClickHouse: Per-Column Codec Selection

ClickHouse provides the most granular compression control of any production database. Each column can be assigned a pipeline of pre-compression codecs followed by a general-purpose algorithm.

Codec Best For Mechanism
Delta Monotonic sequences, timestamps Stores difference between consecutive values
DoubleDelta Periodic timestamps, increasing counters Delta-of-deltas; exceptional for constant-interval series
Gorilla Slowly changing floats, gauge metrics XOR between consecutive values
T64 Sparse integers, values small relative to type range 64x64 bit matrix transpose + bit truncation
FPC Float64 columns Two-prediction finite context method
LZ4 General purpose, speed-critical Default; fast compression and decompression
ZSTD(level) General purpose, ratio-critical 30% better ratio than LZ4; higher CPU

Real-World Benchmarks (NOAA Weather Dataset)

Schema Stage Compressed Size Uncompressed Size Ratio
Naive (all String) 4.07 GiB 131.58 GiB 32.36:1
Type-optimized 3.83 GiB 71.38 GiB 18.63:1
String/type refined 3.81 GiB 35.34 GiB 9.28:1
Optimal codecs 1.42 GiB 36.05 GiB 25.36:1

Moving from naive types to optimal codecs reduced compressed storage by 65% (4.07 GiB to 1.42 GiB). Standout: Date column with DoubleDelta reduced 2.24 GiB to 24.28 MiB (~99% reduction).

1.4 SQLite: Page-Level Compression Options

SQLite has no built-in compression. External strategies fill the gap:

Solution Mechanism Size Reduction Query Impact
ZIPVFS (official, commercial) Per-page compression Variable Low overhead
sqlite-zstd (Rust extension) Row-level zstd with dictionary 75% (442 MB to 67 MB typical) Sequential: -22%, Random: +34% faster
sqlite_zstd_vfs Page-level VFS streaming 45–65% -25% query throughput
ZFS/Btrfs filesystem Transparent block-level LZ4: ~67%, zstd: up to 90% Minimal to none

1.5 Cross-Engine Compression Ratio Summary

Engine Strategy Typical Ratio Notes
PostgreSQL (heap) TOAST LZ4 2–2.2x Row-level only
PostgreSQL (columnar) Citus/Hydra 3–10x Analytical tables
MySQL InnoDB Transparent page (LZ4) 2–4x 10–17% overhead
ClickHouse LZ4 default 5–15x Column-oriented
ClickHouse Optimal codecs + ZSTD 15–50x Per-column tuning
SQLite sqlite-zstd + dictionary 4–6x Extension required
TimescaleDB Hypercore columnar 10–20x Time-series optimized

2. KV Store Compression

2.1 RocksDB: Per-Level Compression Strategy

RocksDB (LSM-tree) provides the most sophisticated KV compression architecture, with per-level algorithm selection.

Facebook/Meta Production Configuration

compression_per_level = kNoCompression      # L0
                       kNoCompression      # L1
                       kLZ4Compression     # L2
                       kLZ4Compression     # L3
                       kLZ4Compression     # L4
                       kZSTD               # L5
                       kZSTD               # L6
bottommost_compression = kZSTD

Rationale: L0 and L1 are frequently compacted; compression adds CPU load with minimal space benefit. The bottommost level holds the vast majority of data — ZSTD's higher ratio pays off there.

Algorithm Compression Speed Decompression Speed Ratio Recommendation
None N/A N/A 1.0x L0–L1 (hot levels)
LZ4 ~3.5 GB/s ~3.5 GB/s ~1.12x Middle levels
LZ4HC Slower than LZ4 Same as LZ4 ~1.3x Read-heavy middle levels
ZSTD-1 ~1 GB/s ~1 GB/s ~1.5–2x Good speed/ratio balance
ZSTD-3 (default) ~800 MB/s ~1 GB/s ~2–3x Bottommost level default
ZSTD-19 ~50 MB/s ~1 GB/s ~3–4x Cold/archival only

2.2 LMDB: No Built-In Compression

LMDB deliberately excludes compression from its B+tree engine. External compression strategies:

Strategy Implementation Performance
Application-level LZ4 Compress values before mdb_put() Best read speed
Application-level Snappy Compress values before mdb_put() Best write speed, best space savings
Filesystem-level (ZFS/Btrfs) Transparent to LMDB Zero application changes

When LMDB is appropriate: Single-writer/many-reader workloads where zero-copy reads matter more than storage density. If you need built-in compression, use RocksDB instead.

2.3 Redis: Memory-Level Compression Encodings

Redis compresses at the data-structure level rather than the page or block level:

Encoding Data Structure Memory Savings Threshold
ziplist (< Redis 7) Hashes, sorted sets, lists 5–10x vs pointer-based hash-max-ziplist-entries: 512
listpack (Redis 7+) Replacement for ziplist Similar to ziplist, safer hash-max-listpack-entries: 128
quicklist Lists Ziplist nodes in doubly-linked list list-max-listpack-size: -2 (8 KB)
intset Sets of integers Compact integer array set-max-intset-entries: 512

Application-level compression benchmarks (LogicMonitor study):

  • LZ4 on values: 60% memory savings (39,739 bytes to 16,208 bytes)
  • Float16 + Blosc + ZLib + Base64 pipeline: 85% savings (80,000 bytes to 11,383 bytes)

2.4 Cloudflare Workers KV: Edge Compression

Cloudflare Workers KV is an eventually-consistent global KV store optimized for read-heavy, write-infrequent workloads.

  • 2025 redesign: Hybrid storage — distributed database for small values, R2 object storage for large values
  • p99 read latency improved from 200 ms to under 5 ms after the 2025 rearchitecture
  • Internal compression is opaque; Cloudflare handles storage optimization at the infrastructure level

Best practice: Pre-compress with CompressionStream (Web Streams API) in the Worker itself, using gzip (broadest decompression support) or raw deflate.

2.5 DynamoDB: Client-Side Attribute Compression

DynamoDB provides no server-side compression. All compression is application-layer.

Algorithm Best For WCU/RCU Savings
GZIP Text-heavy items Up to 70% cost savings
LZO Faster compression, lower ratio 40–50%
Snappy Fast, moderate ratio 30–45%
Important constraint: Compressed attributes cannot be used in filter expressions, key conditions, or projection expressions meaningfully. Design partition/sort keys and GSI attributes to remain uncompressed.

2.6 etcd / bbolt: Compaction, Not Compression

etcd uses bbolt (B+tree store with 4 KB memory-mapped pages). bbolt has no compression capability.

What etcd calls "compaction" is revision pruning: dropping superseded MVCC revisions to reclaim logical space. bbolt does not release freed pages back to the OS — only etcd defrag physically reclaims disk space.

Practical guidance:

  • Run etcd compaction regularly to prune old revisions
  • Follow with etcd defrag to reclaim disk space
  • For storage-constrained environments, reduce --snapshot-count and enable auto-compaction

3. Column-Oriented vs Row-Oriented Compression Ratios

The fundamental difference: columns contain homogeneous data types, enabling specialized codecs that exploit type-specific patterns. Rows mix types, limiting compression to general-purpose algorithms.

ClickBench Comparison (100M rows)

System Type Storage Size Relative
ClickHouse Columnar 9.26 GiB 1x
TimescaleDB (compressed) Hybrid columnar ~20 GiB ~2x
PostgreSQL (heap) Row-oriented ~100 GiB ~10x

Why Columnar Compresses Better

  1. Run-Length Encoding (RLE): Sorted columns with repeated values compress to (value, count) pairs.
  2. Dictionary Encoding: Low-cardinality string columns replace strings with integer indices.
  3. Delta Encoding: Monotonic sequences store only differences. If timestamps increment by exactly 1 second, each value is 0 bits after double-delta.
  4. Bit-Packing / Frame-of-Reference: Integer values clustered in a narrow range are stored with minimal bits per value.
  5. Type-Specific Codecs: Gorilla for floats, DoubleDelta for timestamps — impossible in row stores that see each row as an opaque tuple.

Quantified Advantage

  • Row-oriented databases (PostgreSQL, MySQL): 2–4x compression on typical OLTP data
  • Columnar databases (ClickHouse, DuckDB, Redshift): 10–50x compression on analytical data
  • Hybrid approaches (TimescaleDB Hypercore, Citus Columnar): 5–20x, bridging the gap

The 10x storage difference between ClickHouse and PostgreSQL on the same 100M-row dataset represents real production cost: at S3 pricing ($0.023/GB/month), that is the difference between $0.21/month and $2.30/month per dataset — scaling to thousands of dollars at enterprise data volumes.

4. Compression Codec Selection by Data Type

Data Type Primary Codec Secondary Codec Expected Ratio Notes
Timestamps (periodic) DoubleDelta +LZ4 64:1 to 800:1 Constant intervals compress to ~1 bit/value
Timestamps (irregular) Delta +ZSTD 4–10x Variable deltas need general-purpose follow-up
Auto-increment integers Delta +LZ4 100–800:1 Monotonic sequences are ideal for delta
Gauge integers (metrics) T64 +ZSTD 3–8x T64 truncates unused high bits
Float64 (slowly changing) Gorilla +LZ4 2–8x XOR encoding exploits bit-level similarity
Float64 (volatile) None ZSTD 1.5–3x Gorilla provides no benefit on random floats
Low-cardinality strings Dictionary encoding +LZ4 5–20x Replace strings with integer indices
JSON / JSONB Dictionary (if repetitive) ZSTD 3–10x Key names repeat; dictionary training excels
UUIDs (binary 16-byte) None None 1x (already compact) Store as UUID/BINARY(16), not TEXT
Booleans Bit-packing / RLE +LZ4 8–64x 1 bit vs 1 byte per value

Key Principles

  1. Smallest type first: Use Int16 instead of Int64 when range permits.
  2. Pre-compression codecs before general-purpose: Delta, DoubleDelta, Gorilla, and T64 transform data into more compressible representations before LZ4/ZSTD processes it.
  3. Measure per column: Compression effectiveness varies dramatically across columns in the same table. Always benchmark with real data.
  4. UUIDs are compression-hostile: Their uniform distribution defeats all algorithms. Store as binary (16 bytes) rather than text (36 bytes) for a guaranteed 2.25x savings without compression.

5. Dictionary Training on Schema-Specific Data

How ZSTD Dictionary Training Works

ZSTD dictionaries capture recurring byte patterns from a training corpus. During compression, each input block is encoded using references to dictionary fragments rather than raw literals. The dictionary is typically small (32–100 KiB) and is shared across all compressions of similar data.

Data Category Without Dictionary With Dictionary Improvement
Small JSON objects (<1 KB) 1.5–2x 5–8x 3–4x better
Database pages (4–16 KB) 2–3x 4–6x ~2x better
SSTable blocks (Cassandra) 2–3x 6–10x 3–4x better
Log lines (structured) 2–4x 6–12x 3x better
Large blobs (>100 KB) 3–5x 3.5–5.5x Marginal
Key insight: Dictionary gains are inversely proportional to input size. For data under 1 KB, dictionaries are transformational. For data over 100 KB, the input itself provides enough context that a dictionary adds little.

Dictionary Training Guidelines

  • Corpus size: 100x the target dictionary size minimum (e.g., 10 MB of samples for a 100 KB dictionary)
  • Dictionary size: 32–100 KiB (larger dictionaries offer diminishing returns)
  • Retraining frequency: When data distribution shifts significantly (schema changes, new value patterns)
  • Storage overhead: Dictionary must be available at decompression time; embed it alongside compressed data or store in metadata

Production Implementations

  • RocksDB: Supports ZSTD dictionary training at the bottommost level during compaction. Enable with CompressionOptions::zstd_max_train_bytes and CompressionOptions::max_dict_bytes.
  • Cassandra (CEP-54): Proposed per-table dictionary support for SSTable compression. Expected 3–4x improvement over non-dictionary ZSTD.
  • sqlite-zstd: Trains dictionaries per-column at the SQLite table level. Benchmark: 2.0 GB database compressed to 528 MB (75%) with dictionary vs 1.63 GB (23%) without — a 3.2x improvement from dictionary training alone.

6. Write Amplification Tradeoffs

In LSM-tree databases (RocksDB, LevelDB, Cassandra, CockroachDB), every logical write is physically written multiple times as data moves through compaction levels. Compression adds CPU cost to each of these writes.

Compression Level vs Write Throughput

Compression Compaction CPU Cost Bytes Written Net Write Throughput Best For
None Lowest Highest Highest ingest rate Burst ingestion, L0–L1
LZ4 Very low (+5–10%) Moderate Near-maximum Hot levels, mixed workloads
ZSTD-1 Low (+15–20%) Low Good balance Middle levels
ZSTD-3 Moderate (+30–40%) Low Good for read-heavy Bottommost level default
ZSTD-9 High (+100–200%) Very low Write-limited Archival, cold tiers
ZSTD-19 Very high (+500%+) Lowest Severely limited Offline compaction only

Decision Framework

If write_throughput_critical AND ssd_storage:
    L0-L1: None, L2-L4: LZ4, Lmax: ZSTD-1
If storage_cost_critical AND write_moderate:
    L0: None, L1-L4: LZ4, Lmax: ZSTD-3
If archival AND write_infrequent:
    All levels: ZSTD-9 or ZSTD-19

7. Algorithm Showdown: ZSTD vs LZ4 vs Snappy vs PPMd

Metric LZ4 Snappy ZSTD-1 ZSTD-3 ZSTD-9 PPMd
Compression speed ~3.5 GB/s ~3.5 GB/s ~1 GB/s ~800 MB/s ~200 MB/s ~20 MB/s
Decompression speed ~3.5 GB/s ~3.5 GB/s ~1 GB/s ~1 GB/s ~1 GB/s ~20 MB/s
Ratio (structured data) 1.5–2.5x 1.4–2.2x 2–3x 2.5–4x 3–5x 4–8x
Ratio (text data) 2–3x 1.8–2.5x 3–5x 4–6x 5–8x 6–12x
Dictionary support No No Yes Yes Yes Yes

Cost-at-Scale Comparison (500 TB/month, S3 eu-central-1 at $0.0235/GB)

Algorithm Effective Size Monthly Cost Savings vs None
None 500 TB $11,750
LZ4 ~333 TB (1.5x) $7,833 $3,917/mo
ZSTD-3 ~167 TB (3x) $3,917 $7,833/mo
ZSTD-19 ~125 TB (4x) $2,938 $8,812/mo

ZSTD-3 delivers the sweet spot: 2x the savings of LZ4 with practical CPU overhead.

8. Practical Pipeline Recommendations

Use Case Database Compression Strategy Expected Ratio Priority
OLTP web app PostgreSQL TOAST LZ4 2x Latency
Time-series metrics ClickHouse DoubleDelta+LZ4 (time), Gorilla+LZ4 (values) 15–50x Storage + query speed
Log analytics ClickHouse Delta+ZSTD (timestamps), ZSTD (text) 10–30x Storage
Embedded mobile DB SQLite sqlite-zstd with dictionary 4–6x Storage + random read
KV cache (hot) Redis Listpack encoding + app-level LZ4 2–5x Memory cost
KV persistent store RocksDB ZSTD with dictionary (bottom), LZ4 (mid) 3–6x Storage
Global edge KV Cloudflare KV Client-side gzip/ZSTD pre-compression 2–4x Egress cost
Serverless KV DynamoDB Client-side GZIP on large attributes 2–4x WCU/RCU cost
Analytical warehouse ClickHouse/DuckDB Full codec optimization per column 15–50x Query + storage

Decision Flowchart

START
  |
  v
Is latency the primary concern?
  |-- YES --> Use LZ4 everywhere. Measure. Stop.
  |-- NO
  v
Is this columnar / analytical data?
  |-- YES --> Use type-specific codecs (Delta, DoubleDelta, Gorilla, T64) + ZSTD
  |-- NO
  v
Is this an LSM-tree KV store?
  |-- YES --> No compression L0-L1, LZ4 L2-L4, ZSTD-3 Lmax
  |-- NO
  v
Is this a row-oriented RDBMS?
  |-- YES --> PostgreSQL: TOAST LZ4 | MySQL: Transparent page compression LZ4
  |-- NO
  v
Is this a managed KV service (DynamoDB, Workers KV)?
  |-- YES --> Client-side compression (GZIP for broadest compat, ZSTD for best ratio)
  |-- NO
  v
Default: ZSTD-3 at the application layer. Always measure.

9. Sources and Citations

Appendix: Practical Database Compression Choices (2025–2026 Era)

What Changed Recently

  1. ZSTD is now the de facto standard. In 2020, Snappy and zlib dominated. By 2026, ZSTD has replaced both in most production systems.
  2. Dictionary compression matured. RocksDB's dictionary support is production-ready. sqlite-zstd demonstrates 3x improvement from dictionary training alone.
  3. LZ4 won the speed tier. Snappy has no remaining advantage. PostgreSQL 14+ made LZ4 TOAST a first-class option.
  4. Columnar compression in row databases. TimescaleDB Hypercore, Citus Columnar, and Hydra bring 10–20x compression to PostgreSQL without leaving the ecosystem.
  5. Edge KV stores are opaque. Cloudflare Workers KV and similar edge stores handle compression internally. The optimization lever for users is client-side pre-compression and efficient serialization.

Quick-Reference Decision Table

Question Answer Action
PostgreSQL and need compression? Use LZ4 TOAST SET default_toast_compression = lz4;
PostgreSQL and need 10x+ compression? Add columnar extension Citus Columnar or TimescaleDB Hypercore for cold data
ClickHouse and want optimal storage? Tune codecs per column Profile with DESCRIBE TABLE ... SETTINGS describe_compact_output=0
RocksDB and need space savings? Per-level compression compression_per_level + bottommost_compression=kZSTD
SQLite and DB too large? sqlite-zstd extension Train dictionary per large text/JSON column
DynamoDB and high costs? Client-side GZIP Compress non-filterable attributes before PutItem
Need to choose one algorithm? ZSTD-3 Best general-purpose balance of ratio, speed, and ecosystem support
The One Rule: Measure with your data. Compression ratios published in benchmarks are averages across specific datasets. Your schema, cardinality, value distributions, and access patterns will produce different numbers. Always benchmark candidate configurations against a representative sample of your production data before committing to a compression strategy.