Skip to content

Adding Compression and Deduplication logic#287

Open
shardulnegi-boop wants to merge 6 commits intosalesforce:masterfrom
shardulnegi-boop:master
Open

Adding Compression and Deduplication logic#287
shardulnegi-boop wants to merge 6 commits intosalesforce:masterfrom
shardulnegi-boop:master

Conversation

@shardulnegi-boop
Copy link
Copy Markdown
Contributor

@shardulnegi-boop shardulnegi-boop commented Apr 23, 2026

Add payload deduplication and zstd compression for Sloop storage

Two opt-in features that reduce Sloop's Badger storage footprint:

  1. Payload deduplication — hash-based skip for K8s watch writes where spec hasn't changed (volatile fields stripped before hashing). Snapshot written every 30m to preserve timeline continuity.
  2. zstd compression — compresses payloads before Badger write, auto-detects on read via magic header.

Both default off, managed by flags.

Results:

Storage (disk bytes on host):

┌──────────┬─────────┬─────────────┐
│ Variant │ Disk │ vs baseline │
├──────────┼─────────┼─────────────┤
│ baseline │ 7.62 GB │ — │
├──────────┼─────────┼─────────────┤
│ dedup │ 6.61 GB │ -13% │
├──────────┼─────────┼─────────────┤
│ zstd │ 2.30 GB │ -70% │
├──────────┼─────────┼─────────────┤
│ both │ 1.63 GB │ -79% │
└──────────┴─────────┴─────────────┘

Read latency (cold p99, 20 iterations, medium queries):

┌──────────┬────────┐
│ Variant │ p99 │
├──────────┼────────┤
│ baseline │ 0.522s │
├──────────┼────────┤
│ dedup │ 0.575s │
├──────────┼────────┤
│ zstd │ 1.749s │
├──────────┼────────┤
│ both │ 1.256s │
└──────────┴────────┘

Docker compose for 5 hours

Flags

  • --enable-payload-dedup (default false)
  • --dedup-snapshot-interval (default 30m)
  • --enable-zstd-compression (default false)

Tests

17 new unit tests (hash stability, dedup logic, round-trip compression). All existing tests pass.

Adding compression and Deduplication logic for sloop
fixing pipeline issues
 Add payload deduplication and zstd compression for Sloop storage

  Two opt-in features that reduce Sloop's Badger storage footprint:

  1. Payload deduplication — hash-based skip for K8s watch writes where spec hasn't changed (volatile fields stripped before hashing). Snapshot written every 30m to preserve timeline continuity.
  2. zstd compression — compresses payloads before Badger write, auto-detects on read via magic header.

  Both default off, managed by flags.

  Results:

  Storage (disk bytes on host):

  ┌──────────┬─────────┬─────────────┐
  │ Variant  │  Disk   │ vs baseline │
  ├──────────┼─────────┼─────────────┤
  │ baseline │ 7.62 GB │           — │
  ├──────────┼─────────┼─────────────┤
  │ dedup    │ 6.61 GB │        -13% │
  ├──────────┼─────────┼─────────────┤
  │ zstd     │ 2.30 GB │        -70% │
  ├──────────┼─────────┼─────────────┤
  │ both     │ 1.63 GB │        -79% │
  └──────────┴─────────┴─────────────┘

  Read latency (cold p99, 20 iterations, medium queries):

  ┌──────────┬────────┐
  │ Variant  │  p99   │
  ├──────────┼────────┤
  │ baseline │ 0.522s │
  ├──────────┼────────┤
  │ dedup    │ 0.575s │
  ├──────────┼────────┤
  │ zstd     │ 1.749s │
  ├──────────┼────────┤
  │ both     │ 1.256s │
  └──────────┴────────┘

Docker compose for 5 hours

  Flags

  - --enable-payload-dedup (default false)
  - --dedup-snapshot-interval (default 30m)
  - --enable-zstd-compression (default false)

  Tests

  17 new unit tests (hash stability, dedup logic, round-trip compression). All existing tests pass.
 Add payload deduplication and zstd compression for Sloop storage

  Two opt-in features that reduce Sloop's Badger storage footprint:

  1. Payload deduplication — hash-based skip for K8s watch writes where spec hasn't changed (volatile fields stripped before hashing). Snapshot written every 30m to preserve timeline continuity.
  2. zstd compression — compresses payloads before Badger write, auto-detects on read via magic header.

  Both default off, managed by flags.

  Results:

  Storage (disk bytes on host):

  ┌──────────┬─────────┬─────────────┐
  │ Variant  │  Disk   │ vs baseline │
  ├──────────┼─────────┼─────────────┤
  │ baseline │ 7.62 GB │           — │
  ├──────────┼─────────┼─────────────┤
  │ dedup    │ 6.61 GB │        -13% │
  ├──────────┼─────────┼─────────────┤
  │ zstd     │ 2.30 GB │        -70% │
  ├──────────┼─────────┼─────────────┤
  │ both     │ 1.63 GB │        -79% │
  └──────────┴─────────┴─────────────┘

  Read latency (cold p99, 20 iterations, medium queries):

  ┌──────────┬────────┐
  │ Variant  │  p99   │
  ├──────────┼────────┤
  │ baseline │ 0.522s │
  ├──────────┼────────┤
  │ dedup    │ 0.575s │
  ├──────────┼────────┤
  │ zstd     │ 1.749s │
  ├──────────┼────────┤
  │ both     │ 1.256s │
  └──────────┴────────┘

Docker compose for 5 hours

  Flags

  - --enable-payload-dedup (default false)
  - --dedup-snapshot-interval (default 30m)
  - --enable-zstd-compression (default false)

  Tests

  17 new unit tests (hash stability, dedup logic, round-trip compression). All existing tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant