Adding Compression and Deduplication logic#287
Open
shardulnegi-boop wants to merge 6 commits intosalesforce:masterfrom
Open
Adding Compression and Deduplication logic#287shardulnegi-boop wants to merge 6 commits intosalesforce:masterfrom
shardulnegi-boop wants to merge 6 commits intosalesforce:masterfrom
Conversation
Adding compression and Deduplication logic for sloop
fixing pipeline issues
Add payload deduplication and zstd compression for Sloop storage Two opt-in features that reduce Sloop's Badger storage footprint: 1. Payload deduplication — hash-based skip for K8s watch writes where spec hasn't changed (volatile fields stripped before hashing). Snapshot written every 30m to preserve timeline continuity. 2. zstd compression — compresses payloads before Badger write, auto-detects on read via magic header. Both default off, managed by flags. Results: Storage (disk bytes on host): ┌──────────┬─────────┬─────────────┐ │ Variant │ Disk │ vs baseline │ ├──────────┼─────────┼─────────────┤ │ baseline │ 7.62 GB │ — │ ├──────────┼─────────┼─────────────┤ │ dedup │ 6.61 GB │ -13% │ ├──────────┼─────────┼─────────────┤ │ zstd │ 2.30 GB │ -70% │ ├──────────┼─────────┼─────────────┤ │ both │ 1.63 GB │ -79% │ └──────────┴─────────┴─────────────┘ Read latency (cold p99, 20 iterations, medium queries): ┌──────────┬────────┐ │ Variant │ p99 │ ├──────────┼────────┤ │ baseline │ 0.522s │ ├──────────┼────────┤ │ dedup │ 0.575s │ ├──────────┼────────┤ │ zstd │ 1.749s │ ├──────────┼────────┤ │ both │ 1.256s │ └──────────┴────────┘ Docker compose for 5 hours Flags - --enable-payload-dedup (default false) - --dedup-snapshot-interval (default 30m) - --enable-zstd-compression (default false) Tests 17 new unit tests (hash stability, dedup logic, round-trip compression). All existing tests pass.
Add payload deduplication and zstd compression for Sloop storage Two opt-in features that reduce Sloop's Badger storage footprint: 1. Payload deduplication — hash-based skip for K8s watch writes where spec hasn't changed (volatile fields stripped before hashing). Snapshot written every 30m to preserve timeline continuity. 2. zstd compression — compresses payloads before Badger write, auto-detects on read via magic header. Both default off, managed by flags. Results: Storage (disk bytes on host): ┌──────────┬─────────┬─────────────┐ │ Variant │ Disk │ vs baseline │ ├──────────┼─────────┼─────────────┤ │ baseline │ 7.62 GB │ — │ ├──────────┼─────────┼─────────────┤ │ dedup │ 6.61 GB │ -13% │ ├──────────┼─────────┼─────────────┤ │ zstd │ 2.30 GB │ -70% │ ├──────────┼─────────┼─────────────┤ │ both │ 1.63 GB │ -79% │ └──────────┴─────────┴─────────────┘ Read latency (cold p99, 20 iterations, medium queries): ┌──────────┬────────┐ │ Variant │ p99 │ ├──────────┼────────┤ │ baseline │ 0.522s │ ├──────────┼────────┤ │ dedup │ 0.575s │ ├──────────┼────────┤ │ zstd │ 1.749s │ ├──────────┼────────┤ │ both │ 1.256s │ └──────────┴────────┘ Docker compose for 5 hours Flags - --enable-payload-dedup (default false) - --dedup-snapshot-interval (default 30m) - --enable-zstd-compression (default false) Tests 17 new unit tests (hash stability, dedup logic, round-trip compression). All existing tests pass.
updating failed test case
updating failed test case
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add payload deduplication and zstd compression for Sloop storage
Two opt-in features that reduce Sloop's Badger storage footprint:
Both default off, managed by flags.
Results:
Storage (disk bytes on host):
┌──────────┬─────────┬─────────────┐
│ Variant │ Disk │ vs baseline │
├──────────┼─────────┼─────────────┤
│ baseline │ 7.62 GB │ — │
├──────────┼─────────┼─────────────┤
│ dedup │ 6.61 GB │ -13% │
├──────────┼─────────┼─────────────┤
│ zstd │ 2.30 GB │ -70% │
├──────────┼─────────┼─────────────┤
│ both │ 1.63 GB │ -79% │
└──────────┴─────────┴─────────────┘
Read latency (cold p99, 20 iterations, medium queries):
┌──────────┬────────┐
│ Variant │ p99 │
├──────────┼────────┤
│ baseline │ 0.522s │
├──────────┼────────┤
│ dedup │ 0.575s │
├──────────┼────────┤
│ zstd │ 1.749s │
├──────────┼────────┤
│ both │ 1.256s │
└──────────┴────────┘
Docker compose for 5 hours
Flags
Tests
17 new unit tests (hash stability, dedup logic, round-trip compression). All existing tests pass.