Skip to content

feat(cache): cache-aware dedup with preserve_cache_prefix option#59

Merged
Siddhant-K-code merged 3 commits into
mainfrom
feat/50-cache-aware-dedup
May 2, 2026
Merged

feat(cache): cache-aware dedup with preserve_cache_prefix option#59
Siddhant-K-code merged 3 commits into
mainfrom
feat/50-cache-aware-dedup

Conversation

@Siddhant-K-code
Copy link
Copy Markdown
Owner

Closes #50

What

Adds preserve_cache_prefix to /v1/dedupe so the dedup pipeline cannot reorder or remove chunks that appear before a cache_control breakpoint. Without this, Distill can silently invalidate Anthropic prompt cache prefixes while improving context quality.

Changes

pkg/cache/prefix.go (new)

  • PartitionForCacheAwareDedup(chunks) — detects cache_control markers in chunk metadata, returns PrefixPartition with Prefix (frozen), Suffix (dedup-eligible), PrefixHash, FrozenPrefixTokens, MarkerCount
  • hasCacheControl handles string, map, bool, and nil marker values
  • PrefixAwareStats type for response stats

cmd/api.go

  • DedupeChunk gains cache_control string field
  • DedupeOptions type with preserve_cache_prefix bool
  • DedupeRequest gains Options DedupeOptions
  • DedupeStats gains cache_prefix_frozen, cache_prefix_tokens, cache_prefix_hash, suffix_input_count, suffix_output_count
  • handleDedupe: partitions chunks when preserve_cache_prefix=true, runs cluster/select/MMR only on the suffix, prepends frozen prefix to output

pkg/cache/prefix_test.go (new) — 6 tests covering no-marker passthrough, single/multiple markers, marker at end, hash stability, hash change detection

API

POST /v1/dedupe
{
  "chunks": [
    {"id": "sys", "text": "You are a helpful assistant.", "cache_control": "ephemeral"},
    {"id": "msg1", "text": "What is 2+2?"},
    {"id": "msg2", "text": "What is 2 plus 2?"}
  ],
  "options": {"preserve_cache_prefix": true}
}

Response:

{
  "chunks": [{"id": "sys", ...}, {"id": "msg1", ...}],
  "stats": {
    "cache_prefix_frozen": true,
    "cache_prefix_tokens": 64,
    "cache_prefix_hash": "a3f2c1d4e5b6f7a8",
    "suffix_input_count": 2,
    "suffix_output_count": 1
  }
}

Why opt-in

Not all workloads use prompt caching. The current full-reorder behaviour produces better context quality when caching is not a concern. preserve_cache_prefix=false (default) leaves existing behaviour unchanged.

Siddhant-K-code and others added 2 commits May 2, 2026 13:11
Add PrefixPartition to pkg/cache that splits a chunk slice at the last
cache_control marker. Wire into /v1/dedupe via options.preserve_cache_prefix:
the frozen prefix is passed through unchanged; the dedup pipeline runs
only on the suffix. Response stats include cache_prefix_frozen,
cache_prefix_tokens, cache_prefix_hash, suffix_input_count,
suffix_output_count.

DedupeChunk gains cache_control field. DedupeOptions type added.

Co-authored-by: Ona <no-reply@ona.com>
Add StabilityValidator to pkg/cache. Tracks prefix hashes per call site
across requests and reports StabilityIssue when the rate drops below
UnstableThreshold (default 0.8). Includes:

- Runtime Check(callSite, chunks): records hash, detects changes after
  WarmupChecks (default 3), diagnoses likely cause from DynamicPatterns
- Static ValidateText(text): one-shot scan for dynamic interpolation
  patterns (request id, timestamp, uuid, random, etc.)
- Stats/AllStats/Reset for observability

Co-authored-by: Ona <no-reply@ona.com>
handleDedupeStream was missing the partition logic added to handleDedupe.
Both handlers now freeze the cache prefix, run dedup only on the suffix,
and report cache_prefix_* fields in stats.

Co-authored-by: Ona <no-reply@ona.com>
@Siddhant-K-code Siddhant-K-code merged commit 0917f08 into main May 2, 2026
2 checks passed
@Siddhant-K-code Siddhant-K-code deleted the feat/50-cache-aware-dedup branch May 2, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Cache-aware dedup — preserve prefix structure when deduplicating

1 participant