You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+19-19Lines changed: 19 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,13 +10,13 @@
10
10
11
11
**Open-source context preprocessing for LLM applications.**
12
12
13
-
Distill sits between your application and any LLM. It cleans up context before it's sent — deduplicating semantically redundant chunks, compressing conversation history as it ages, and placing cache markers on stable content so Anthropic's prompt cache actually fires.
13
+
Distill sits between your application and any LLM. It cleans up context before it's sent: deduplicating semantically redundant chunks, compressing conversation history as it ages, and placing cache markers on stable content so Anthropic's prompt cache actually fires.
14
14
15
15
The result: fewer tokens sent, lower cost per request, and context windows that don't fill up with noise.
16
16
17
17
**[Learn more →](https://distill.siddhantkhare.com)**
18
18
19
-
> 📖 Distill implements the 4-layer context engineering stack described in **[The Agentic Engineering Guide](https://agents.siddhantkhare.com/05-context-engineering-stack/)** — a free, open book on AI agent infrastructure.
19
+
> 📖 Distill implements the 4-layer context engineering stack described in **[The Agentic Engineering Guide](https://agents.siddhantkhare.com/05-context-engineering-stack/)**, a free open book on AI agent infrastructure.
20
20
21
21
```
22
22
RAG / tools / memory / docs
@@ -29,7 +29,7 @@ RAG / tools / memory / docs
29
29
30
30
## The Problem
31
31
32
-
30–40% of context assembled from multiple sources is semantically redundant. The same information arrives from docs, code, memory, and tool outputs — competing for attention in the same prompt.
32
+
30-40% of context assembled from multiple sources is semantically redundant. The same information arrives from docs, code, memory, and tool outputs, all competing for attention in the same prompt.
33
33
34
34
This causes non-deterministic outputs, confused reasoning, and failures that only show up at scale. Better prompts don't fix it. The context going in needs to be clean.
| `distill_cache_hit_rate` | Gauge | Rolling hit rate: `cache_read / (cache_read + cache_creation + input)`|
687
-
| `distill_cache_write_efficiency` | Gauge | Reads/writes ratio — values < 1.0 mean cache writes that expire before being read |
687
+
| `distill_cache_write_efficiency` | Gauge | Reads/writes ratio. Values below 1.0 mean cache writes that expire before being read |
688
688
689
689
**Per-call-site hit rate tracking**
690
690
@@ -792,8 +792,8 @@ The `DecayWorker` emits typed events on every state transition so that cache bou
792
792
793
793
| Event | When | Cache boundary action |
794
794
|-------|------|-----------------------|
795
-
| `EventCompressed` | Entry compressed to summary or keywords | Retreat boundary — cached prefix is now stale |
796
-
| `EventEvicted` | Entry removed from store | Retreat boundary — entry no longer exists |
795
+
| `EventCompressed` | Entry compressed to summary or keywords | Retreat boundary:cached prefix is now stale |
796
+
| `EventEvicted` | Entry removed from store | Retreat boundary:entry no longer exists |
797
797
| `EventStabilized` | Entry promoted to stable | Advance boundary to include entry |
798
798
799
799
Register a handler on any `Store`:
@@ -813,8 +813,8 @@ Multiple handlers can be registered; they are called in registration order. Hand
813
813
```go
814
814
result, _ := store.Recall(ctx, req)
815
815
if result.CacheHint != nil {
816
-
// result.CacheHint.StableEntryIDs — IDs likely stable this turn
817
-
// result.CacheHint.ConfidenceScore — mean relevance of returned entries
816
+
// result.CacheHint.StableEntryIDs - IDs likely stable this turn
817
+
// result.CacheHint.ConfidenceScore - mean relevance of returned entries
818
818
}
819
819
```
820
820
@@ -861,7 +861,7 @@ session:
861
861
KV cache for repeated context patterns (system prompts, tool definitions, boilerplate). Sub-millisecond retrieval for cache hits.
862
862
863
863
- **MemoryCache** - In-memory LRU with TTL, configurable size limits (entries and bytes), background cleanup
864
-
- **PatternDetector** - Identifies cacheable content and emits `CacheAnnotation` per chunk. Use `AnnotateChunksForCache` to get a `CacheControlPlan` — up to 4 `cache_control` markers (Anthropic's limit) placed at the highest-token-count stable chunks. Auto-placement is skipped when the caller has already set markers manually.
864
+
- **PatternDetector** - Identifies cacheable content and emits `CacheAnnotation` per chunk. Use `AnnotateChunksForCache` to get a `CacheControlPlan` with up to 4 `cache_control` markers (Anthropic's limit) placed at the highest-token-count stable chunks. Auto-placement is skipped when the caller has already set markers manually.
865
865
- **PrefixPartition** - Splits a chunk slice into a frozen cache prefix and a dedup-eligible suffix. Used by the `preserve_cache_prefix` dedup option to prevent Distill from reordering chunks that appear before a `cache_control` breakpoint.
866
866
- **StabilityValidator** - Tracks prefix hashes across requests and detects dynamic content bleeding into cached prefixes. Reports instability with a likely cause and supports static text analysis for pre-flight checks.
867
867
- **RedisCache** - Interface for distributed deployments (requires external Redis)
0 commit comments