You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(cache): cache-aware dedup and prefix stability validator (#59)
Closes#50Closes#48
preserve_cache_prefix option freezes chunks before the last cache_control
marker so the dedup pipeline cannot reorder them. Applied to both
handleDedupe and handleDedupeStream.
StabilityValidator tracks prefix hashes per call site, reports instability
when rate drops below threshold, and provides static ValidateText for
pre-flight pattern scanning.
Co-authored-by: Ona <no-reply@ona.com>
- **MemoryCache** - In-memory LRU with TTL, configurable size limits (entries and bytes), background cleanup
737
737
- **PatternDetector** - Identifies cacheable content and emits `CacheAnnotation` per chunk. Use `AnnotateChunksForCache` to get a `CacheControlPlan` — up to 4 `cache_control` markers (Anthropic's limit) placed at the highest-token-count stable chunks. Auto-placement is skipped when the caller has already set markers manually.
738
+
- **PrefixPartition** - Splits a chunk slice into a frozen cache prefix and a dedup-eligible suffix. Used by the `preserve_cache_prefix` dedup option to prevent Distill from reordering chunks that appear before a `cache_control` breakpoint.
739
+
- **StabilityValidator** - Tracks prefix hashes across requests and detects dynamic content bleeding into cached prefixes. Reports instability with a likely cause and supports static text analysis for pre-flight checks.
738
740
- **RedisCache** - Interface for distributed deployments (requires external Redis)
739
741
742
+
#### Cache-aware dedup (`preserve_cache_prefix`)
743
+
744
+
Distill's dedup pipeline can reorder chunks to improve context quality. When prompt caching is active, reordering chunks before the `cache_control` breakpoint changes the prefix hash and causes a cache miss. Use `preserve_cache_prefix` to freeze the prefix:
745
+
746
+
```json
747
+
POST /v1/dedupe
748
+
{
749
+
"chunks": [
750
+
{"id": "sys", "text": "You are a helpful assistant.", "cache_control": "ephemeral"},
// found = ["request id", "timestamp"] if dynamic patterns detected
790
+
```
791
+
740
792
#### Automatic cache_control placement
741
793
742
794
```go
@@ -820,6 +872,8 @@ Distill is evolving from a dedup utility into a context intelligence layer. Here
820
872
| **Session-aware cache boundary manager** | [#51](https://github.com/Siddhant-K-code/distill/issues/51) | Shipped | Auto-advances `cache_control` placement as sessions grow. Stable entries (present ≥ 2 turns unmodified) are included in the cached prefix; boundary retreats when content changes. |
821
873
| **Cache write cost accounting** | [#52](https://github.com/Siddhant-K-code/distill/issues/52) | Shipped | 9 new Prometheus metrics covering Anthropic prompt cache token usage, hit rate, write efficiency, and boundary position. Feed API response usage via `RecordCacheUsage`. |
822
874
| **Memory decay lifecycle events** | [#54](https://github.com/Siddhant-K-code/distill/issues/54) | Shipped | `DecayWorker` emits `EventCompressed` and `EventEvicted` on each transition. `RecallResult` includes a `CacheBoundaryHint` for high-relevance entries. |
875
+
| **Cache-aware dedup** | [#50](https://github.com/Siddhant-K-code/distill/issues/50) | Shipped | `preserve_cache_prefix` option freezes chunks before the last `cache_control` marker so dedup cannot reorder them. Prefix hash and token count reported in stats. |
876
+
| **Prefix stability validator** | [#48](https://github.com/Siddhant-K-code/distill/issues/48) | Shipped | `StabilityValidator` tracks prefix hashes across requests and detects dynamic content (timestamps, request IDs, UUIDs) bleeding into cached prefixes. |
// Generate embeddings if needed (only for the dedup-eligible suffix).
366
398
ifneedsEmbedding {
367
399
ifs.embedder==nil {
368
400
http.Error(w, "Embeddings required but no embedding provider configured. Either provide embeddings in request or configure OPENAI_API_KEY.", http.StatusBadRequest)
_=sw.SendError(sse.StageEmbedding, "Embeddings required but no embedding provider configured. Either provide embeddings in request or configure OPENAI_API_KEY.")
@@ -539,9 +594,9 @@ func (s *APIServer) handleDedupeStream(w http.ResponseWriter, r *http.Request) {
0 commit comments