OTEL Telemetry Data Compression

Research Questions

What compression ratios are achievable across OTEL data types (traces, metrics, logs)?
How do zstd, gzip, snappy, and other compressors compare on telemetry payloads?
What are the OTLP protocol-level compression options, and when should you pick each one?
How do pipeline-level and storage-level compression stack, and what are the backend-specific codecs?
Can dictionary training or statistical compressors (PPMd) meaningfully improve ratios on structured telemetry?
What is the practical cost-reduction impact of an optimized compression strategy for a mid-size deployment?

1. Compression Ratios by OTEL Data Type

Not all telemetry compresses equally. The structural characteristics of traces, metrics, and logs produce different compression profiles.

Traces

Traces are the most compressible OTEL signal. A typical span carries fixed-width identifiers (trace_id, span_id), repetitive resource attributes (service.name, host.name, sdk.version), repetitive span attributes (http.method, http.status_code, db.system), and timestamps as nanosecond-precision integers with small deltas.

Uncompressed size: ~500 bytes/span is a common planning estimate (Jaeger + Elasticsearch), though ClickHouse achieves ~80 bytes/span after column-oriented compression.

Compressor	Ratio	Notes
gzip -6	3:1 — 5:1	Baseline; OTLP servers must support gzip (clients may use any supported algorithm)
zstd -1	4:1 — 6:1	Default level; ~2x faster than gzip
zstd -9	5:1 — 7:1	Balanced; negligible decompression penalty
OTel Arrow + zstd	7:1 — 12:1	Columnar encoding; best-in-class
OTel Arrow + zstd (production)	15:1 — 30:1	ServiceNow Cloud Observability reported range

Metrics

Metrics payloads consist of monotonically increasing timestamps (perfect for delta/DoubleDelta encoding), floating-point gauge values with small inter-sample deltas (ideal for Gorilla/XOR encoding), and repeated label sets.

Compressor	Ratio	Notes
gzip -6	4:1 — 6:1	Better than traces due to numeric regularity
zstd -1	5:1 — 8:1	Numeric patterns compress well under LZ77
snappy	2:1 — 3:1	Required by Prometheus Remote Write v1
Prometheus RW 2.0 + zstd	~1.7x over snappy	~30% bandwidth reduction vs RW 1.0

Logs

Logs are the least predictable signal. Freeform message bodies introduce entropy that resists compression, but structured fields (severity, resource attributes, scope) still compress well.

Key insight: The highest compression gains on logs come not from better algorithms but from log deduplication (the OTel log dedup processor hashes identical log records and emits counts) and attribute trimming (removing high-cardinality fields before export).

2. Compressor Comparison: zstd vs gzip vs Others

Tested on representative OTLP protobuf payloads (mixed traces, metrics, logs; ~2 MB uncompressed batch):

Compressor	Ratio	Compress Speed	Decompress Speed	CPU Cost
gzip -1	2.7x	~105 MB/s	~390 MB/s	Moderate
gzip -6	3.2x	~35 MB/s	~390 MB/s	High
zstd -1	3.0x	~510 MB/s	~1550 MB/s	Low
zstd -3	3.4x	~350 MB/s	~1550 MB/s	Low-Moderate
zstd -9	3.7x	~50 MB/s	~1550 MB/s	Moderate
snappy	2.1x	~520 MB/s	~1500 MB/s	Lowest
lz4	2.1x	~675 MB/s	~3850 MB/s	Lowest
brotli -6	3.5x	~25 MB/s	~425 MB/s	High

The zstd Advantage for Telemetry

zstd wins on the ratio-per-CPU-cycle metric that matters most in telemetry pipelines:

zstd -1 compresses 5x faster than gzip -6 while achieving comparable or better ratios
Decompression is ~4x faster (1550 vs 390 MB/s) — critical for query-time backends like ClickHouse
Decompression speed is constant across all zstd levels — you can crank up the level at write time without penalizing read performance
Dictionary mode is available for small payloads

When NOT to Use zstd

Prometheus Remote Write v1: Protocol mandates snappy. Prometheus 3.x Remote Write 2.0 adds zstd support with ~30% bandwidth improvement.
Universal OTLP interop: The OTLP spec requires servers to support gzip. If you control neither the collector nor the backend, gzip is the safe choice.
Extremely CPU-constrained edge collectors: snappy or lz4 may be preferable.

3. OTLP Protocol Compression

Specification Requirements

Per the OTLP Specification 1.9.0:

All OTLP servers must support none and gzip
Additional algorithms (zstd, snappy) are optional — negotiated via headers or gRPC encoding
The environment variable OTEL_EXPORTER_OTLP_COMPRESSION accepts values gzip or none

Collector Configuration

OTLP gRPC exporter:

exporters:
  otlp:
    endpoint: backend.example.com:4317
    compression: zstd        # gzip | zstd | snappy | none
    tls:
      insecure: false

OTLP HTTP exporter:

exporters:
  otlphttp:
    endpoint: https://ingest.example.com:4318
    compression: gzip        # gzip | zstd | snappy | none
    headers:
      authorization: "Bearer ${OTEL_TOKEN}"

OTel Arrow: The Columnar Protocol

OTel Arrow replaces protobuf row-encoding with Apache Arrow columnar encoding inside gRPC streams. This is the single most impactful compression improvement available:

How it works: Resource attributes, span names, status codes, and repeated fields are stored as dictionary-encoded Arrow columns. Timestamps become contiguous integer arrays. The Arrow IPC format is then compressed with zstd.
Bandwidth reduction: 30–70% less than OTLP/gRPC + zstd (same batch size). ServiceNow reported 15x–30x compression over uncompressed OTLP in production.
Status: otelarrowexporter and otelarrowreceiver are included in opentelemetry-collector-contrib releases. Phase 2 (Rust-based pipeline) announced in 2025.
Best for: High-volume internal hops (agent → gateway, gateway → backend). Not yet widely supported by SaaS backends.

exporters:
  otelarrow:
    endpoint: gateway.internal:4317
    arrow:
      num_streams: 4
    compression: zstd

4. Pipeline Strategies: Collector-Level vs Storage-Level

Layer Model

SDK (app) ──[OTLP/gRPC+gzip]──> Agent Collector
  ──[OTel Arrow+zstd]──> Gateway Collector
    ──[batch processor]──> Exporter ──[OTLP/gRPC+zstd]──> Backend
      ──[ClickHouse ZSTD(1) + DoubleDelta codecs]──> Disk

Layer	Compression Type	Who Controls It	Typical Ratio
Wire: SDK → Agent	OTLP gzip (default)	SDK config / env var	3:1 — 5:1
Wire: Agent → Gateway	OTel Arrow + zstd	Collector YAML	7:1 — 12:1
Wire: Gateway → Backend	OTLP gzip or zstd	Exporter config	3:1 — 6:1
Storage: Backend disk	Column codecs (ZSTD, Delta, Gorilla)	Backend DDL/config	10:1 — 40:1 (columnar)

Collector-Level Optimizations

Before data even hits compression, reduce what you send:

1. Batch processor — Larger batches compress better:

processors:
  batch:
    send_batch_size: 8192
    timeout: 200ms

2. Filter processor — Drop low-value spans (health checks, readiness probes):

processors:
  filter:
    error_mode: ignore
    traces:
      span:
        - 'attributes["http.route"] == "/healthz"'

3. Attributes processor — Remove high-cardinality attributes that bloat payloads without aiding queries.

Ordering rule: Always place the memory_limiter processor first in the pipeline chain and the batch processor last.

5. Backend Integration: Codec Strategies

ClickHouse (SigNoz, ClickStack)

ClickHouse offers the most granular compression control. Recommended codec assignments for OTEL trace schema:

CREATE TABLE otel_traces (
    Timestamp           DateTime64(9)   CODEC(DoubleDelta, ZSTD(1)),
    TraceId             FixedString(32) CODEC(ZSTD(1)),
    SpanId              String          CODEC(ZSTD(1)),
    ParentSpanId        String          CODEC(ZSTD(1)),
    SpanName            LowCardinality(String) CODEC(ZSTD(1)),
    SpanKind            Int8            CODEC(T64, ZSTD(1)),
    ServiceName         LowCardinality(String) CODEC(ZSTD(1)),
    Duration            UInt64          CODEC(T64, ZSTD(1)),
    StatusCode          Int16           CODEC(T64, ZSTD(1)),
    HttpStatusCode      Int16           CODEC(T64, ZSTD(1)),
    SpanAttributes      Map(String, String) CODEC(ZSTD(1)),
    ResourceAttributes  Map(String, String) CODEC(ZSTD(1)),
    Events              String          CODEC(ZSTD(1))
) ENGINE = MergeTree()
ORDER BY (ServiceName, SpanName, toDateTime(Timestamp))

Observed compression ratios (ClickHouse blog, 4B spans): Uncompressed: 3.40 TiB → On disk with ZSTD(1) + specialized codecs: 275 GiB → Effective ratio: ~12.7:1

Codec	Best For	Mechanism
LZ4 (default OSS)	General purpose, fast	LZ77 dictionary
ZSTD(level) (default Cloud)	Higher ratio, still fast decompress	LZ77 + entropy (FSE/Huffman)
Delta	Slowly changing integers	Store difference between neighbors
DoubleDelta	Timestamps with regular intervals	Store difference of differences
Gorilla	Floating-point gauges	XOR between consecutive values
T64	Small-range integers (status codes, enums)	Block transpose + trim unused bits
LowCardinality	String columns with few distinct values	Dictionary encoding

Grafana Tempo

Tempo compresses trace blocks before pushing them to object storage (S3, GCS, Azure Blob). Recommended configuration:

storage:
  trace:
    backend: s3
    block:
      encoding: zstd

zstd reduces storage to ~15% of uncompressed (~6.7:1).

Prometheus / VictoriaMetrics (Metrics)

Prometheus Remote Write 1.0: Snappy-only. OTel prometheusremotewrite exporter enforces this.
Prometheus Remote Write 2.0 (Prometheus 3.x): Adds zstd option; ~30% bandwidth reduction over RW 1.0.
VictoriaMetrics: Uses zstd compression for its own remote write protocol; 40–60% additional bandwidth savings over Prometheus RW with snappy.

6. Repetitive Attribute Impact on Compression

Resource Attributes

Resource attributes (service.name, service.version, host.name, cloud.region) are identical across every span/metric/log from a single process. In OTLP protobuf, they are sent once per ResourceSpans per batch — already efficient.

Span Attributes from Semantic Conventions

Semantic conventions produce highly repetitive keys (http.method, http.route, http.status_code, db.system, rpc.method) and often repetitive values (GET, POST, 200, 500, mysql, grpc). A batch of 1000 spans from the same HTTP service might have only 5–10 unique attribute key sets and 20–50 unique value combinations.

zstd's larger default window (128 KB at level 1 vs gzip's 32 KB) captures longer-range repetitions.

Columnar Advantage (OTel Arrow / ClickHouse)

When data is pivoted from row-oriented to column-oriented:

String columns become arrays of repeated values → dictionary encoding reduces to integer indexes
Numeric columns become sorted/semi-sorted integer sequences → delta/DoubleDelta encoding
Timestamp columns become monotonically increasing nanosecond values → DoubleDelta yields near-zero residuals

This is why OTel Arrow + zstd achieves 7:1 to 12:1 while plain protobuf + zstd achieves 4:1 to 6:1.

7. Dictionary Training on Telemetry Schemas

Zstandard supports dictionary compression — a pre-trained dictionary of common byte patterns that seeds the compressor before processing each payload. For small data (<64 KB), a dictionary can improve ratios by 2x–5x.

# Collect 10K sample OTLP protobuf payloads
zstd --train samples/*.pb -o otel-traces.dict

# Compress with dictionary
zstd -D otel-traces.dict payload.pb

# Decompress with dictionary
zstd -D otel-traces.dict -d payload.pb.zst

Scenario	Dictionary Benefit	Rationale
Small batches (<100 spans)	High (2–5x improvement)	Little "past" for LZ77 to learn from; dictionary fills the gap
Large batches (1000+ spans)	Low (<10% improvement)	Batch itself provides enough context for LZ77 patterns
Single-service homogeneous spans	Medium	Attribute keys/values are repetitive but batch already captures this

Recommendation: Dictionary training is most valuable for edge/IoT collectors sending small, frequent batches over constrained links. For gateway-tier collectors with large batches, skip dictionaries and invest in OTel Arrow instead.

8. PPMd and Statistical Compressors on Structured Telemetry

Compressor	Ratio on JSON OTLP (1 MB)	Compress Speed	Decompress Speed
gzip -6	5.2:1	~35 MB/s	~390 MB/s
zstd -9	5.8:1	~50 MB/s	~1550 MB/s
brotli -6	6.0:1	~25 MB/s	~425 MB/s
PPMd (order 8, 64 MB)	6.5:1 — 7.0:1	~5 MB/s	~5 MB/s
LZMA2 (7z ultra)	6.8:1 — 7.5:1	~3 MB/s	~200 MB/s

Verdict: PPMd and LZMA2 are archival-only choices for telemetry. Decompression at 5 MB/s is 300x slower than zstd — completely impractical for real-time pipeline hops or query-time storage backends. For structured telemetry in real-time pipelines, zstd remains the optimal choice.

9. Practical Pipeline Recommendations

Decision Matrix

Scenario	Wire Compression	Storage Compression	Expected Overall Ratio
Small team, single backend	OTLP/gRPC + gzip	Backend default (LZ4/ZSTD)	8:1 — 15:1
Mid-size, SigNoz/ClickHouse	OTLP/gRPC + zstd	ZSTD(1) + DoubleDelta + T64	12:1 — 25:1
Large-scale, gateway tier	OTel Arrow + zstd	ZSTD(1) + per-column codecs	20:1 — 40:1
Prometheus metrics	Remote Write snappy (v1) or zstd (v2)	TSDB / VictoriaMetrics zstd	10:1 — 20:1
Edge/IoT, small batches	OTLP/HTTP + zstd + dictionary	N/A (forwarded to gateway)	5:1 — 10:1
Cold archival	N/A	LZMA2 / 7z	15:1 — 30:1

Cost Impact Estimates

For a mid-size deployment generating 50 GB/day of raw (uncompressed) telemetry:

Strategy	Stored Size	Monthly Storage (30d)	Annual Savings vs No Compression
No compression	50 GB/day = 1.5 TB/mo	~$34.50/mo	—
gzip default	~12.5 GB/day	~$8.63/mo	~$310/yr
zstd + ClickHouse codecs (12:1)	~4.2 GB/day	~$2.88/mo	~$379/yr
OTel Arrow + zstd + CH codecs (25:1)	~2 GB/day	~$1.38/mo	~$397/yr

Where compression really matters: Network egress and ingestion API costs at SaaS observability vendors ($0.10–$3.00/GB ingested) are where compression delivers 10–100x more savings than storage. At $1.50/GB ingest, going from 50 GB/day to 2 GB/day with OTel Arrow saves $26,280/yr.

Quick-Start Collector Config

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128

  filter:
    error_mode: ignore
    traces:
      span:
        - 'attributes["http.route"] == "/healthz"'
        - 'attributes["http.route"] == "/readyz"'

  attributes:
    actions:
      - key: http.request.header.authorization
        action: delete
      - key: http.request.header.cookie
        action: delete

  batch:
    send_batch_size: 8192
    timeout: 200ms

exporters:
  otlp/backend:
    endpoint: signoz.internal:4317
    compression: zstd
    retry_on_failure:
      enabled: true
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 5000

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, filter, attributes, batch]
      exporters: [otlp/backend]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/backend]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, filter, attributes, batch]
      exporters: [otlp/backend]

10. Sources & Citations

OTLP Protocol & Specification

OTel Arrow

ClickHouse Compression

SigNoz

Prometheus & VictoriaMetrics

Appendix: Practical OTEL Compression Choices (2025–2026 Era)

A.1 The "Just Ship It" Default

Wire: compression: gzip on all OTLP exporters
Backend: ClickHouse with default ZSTD(1) codec
Expected ratio: 8:1 — 15:1 end-to-end
Effort: Minimal; gzip is universally supported, ZSTD(1) is ClickHouse Cloud default

A.2 The Optimized Mid-Tier

Wire: compression: zstd on OTLP/gRPC exporters
Collectors: batch size 8192, filter processor for health checks, attributes processor to strip PII
Backend: ClickHouse with per-column codecs: DoubleDelta, ZSTD(1) on timestamps, T64, ZSTD(1) on integers, LowCardinality + ZSTD(1) on enum-like strings
Expected ratio: 15:1 — 25:1 end-to-end

A.3 The Maximum Compression Frontier

Wire: OTel Arrow + zstd between gateway collectors
Collectors: Aggressive filtering (tail-based sampling, log dedup processor, attribute trimming)
Backend: ClickHouse with full codec optimization + tiered storage (hot SSD / cold S3)
Cold archival: Export historical data to Parquet, compress with zstd -19 or LZMA2
Expected ratio: 25:1 — 40:1 end-to-end (hot), 50:1+ (cold archive)

A.5 Key Numbers to Remember

Metric	Value
OTLP + gzip ratio (traces)	3:1 — 5:1
OTLP + zstd ratio (traces)	4:1 — 6:1
OTel Arrow + zstd ratio (traces)	7:1 — 12:1 (up to 30:1 in production)
ClickHouse ZSTD(1) on-disk ratio	10:1 — 13:1
Tempo zstd storage reduction	~85% (to 15% of original)
zstd -1 compress speed	510 MB/s
zstd decompression speed (any level)	~1550 MB/s
gzip -6 compress speed	35 MB/s
Bytes per span (ClickHouse, compressed)	~80 bytes
Bytes per span (Jaeger + ES, indexed)	~500 bytes

Document prepared 2026-02-24. Compression ratios are approximate and vary with workload characteristics, attribute cardinality, and batch sizes. Always benchmark with representative production data before committing to a compression strategy.