You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Update description to reflect context intelligence layer positioning
- Add FAQ entries for conflict detection, sensitivity classification,
expiry/supersession
- Update memory description with conflict and sensitivity details
- Update MCP tools list with memory_expire and memory_supersede
- Update embedding provider answers (Ollama/Cohere now shipped)
- Update roadmap section with all shipped v0.9.0/v0.9.1 features
Co-authored-by: Ona <no-reply@ona.com>
Copy file name to clipboardExpand all lines: FAQ.md
+39-9Lines changed: 39 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
5
5
### What does Distill do?
6
6
7
-
Distill is a post-retrieval processing layer for RAG pipelines. When you fetch chunks from a vector database, 30-40% are typically redundant - same information phrased differently. Distill clusters semantically similar chunks, picks the best representative from each cluster, compresses verbose content, and re-ranks for diversity. Total overhead is ~12ms. No LLM calls.
7
+
Distill is a context intelligence layer for LLM agents. It gives agents persistent, deduplicated memory that survives across sessions, deduplicates semantically similar context chunks, compresses verbose content, and re-ranks for diversity. It also detects conflicting information, classifies sensitive content, and manages token-budgeted context windows. Total overhead is ~12ms. No LLM calls.
8
8
9
9
### Why not just fetch fewer results from the vector DB?
10
10
@@ -22,7 +22,7 @@ LLMs are non-deterministic. The same input can produce different compressed outp
22
22
23
23
### What is Context Memory?
24
24
25
-
Persistent memory that accumulates knowledge across agent sessions. Store context once, recall it later by semantic similarity + recency. Memories are deduplicated on write and compressed over time through hierarchical decay (full text → summary → keywords → evicted). Enable with `--memory` on the `api` or `mcp` commands.
25
+
Persistent memory that accumulates knowledge across agent sessions. Store context once, recall it later by semantic similarity + recency. Memories are deduplicated on write, compressed over time through hierarchical decay (full text → summary → keywords → evicted), and automatically classified for sensitivity (PII, credentials, internal IPs). On store, conflicting memories (cosine distance 0.15–0.35) are flagged. On recall, results can be boosted by tags and task context. Enable with `--memory` on the `api` or `mcp` commands.
26
26
27
27
### What are Sessions?
28
28
@@ -32,6 +32,20 @@ Token-budgeted context windows for long-running agent tasks. Push context increm
32
32
33
33
Memory is cross-session: knowledge persists after a session ends and can be recalled in future sessions. Sessions are within-task: a bounded context window that tracks what the agent has seen during a single task, enforcing a token budget. Use memory for long-term knowledge, sessions for working context.
34
34
35
+
### How does conflict detection work?
36
+
37
+
When storing a memory, Distill checks existing entries by cosine distance. Entries below 0.15 are duplicates (skipped). Entries between 0.15 and 0.35 are flagged as conflicts — semantically related but different enough to be contradictory. The conflicts are returned in the store response so the agent can decide which version to keep, or supersede the old one.
38
+
39
+
### What is sensitivity classification?
40
+
41
+
Distill can automatically scan memory content for PII (emails, phone numbers, SSNs), credentials (API keys, tokens, passwords), and internal infrastructure (private IPs, internal domains). Enable with `auto_classify: true` on store. Recall results include `max_sensitivity` and a list of `sensitive_chunks` so agents can handle sensitive data appropriately.
42
+
43
+
### How do expiry and supersession work?
44
+
45
+
**Expire** soft-deletes a memory — it stays in the database but is excluded from recall by default. Useful for marking outdated information without losing it. **Supersede** links an old memory to its replacement — the old entry is expired and tagged with the new entry's ID. This preserves the audit trail while ensuring only current information is recalled.
46
+
47
+
---
48
+
35
49
## Algorithms
36
50
37
51
### Why agglomerative clustering instead of K-Means?
@@ -122,11 +136,21 @@ LangChain's `search_type="mmr"` applies MMR at the vector DB level - a single re
122
136
123
137
### What MCP tools does Distill expose?
124
138
125
-
The base MCP server exposes `deduplicate_context` and `analyze_redundancy`. With `--memory`, it adds `store_memory`, `recall_memory`, `forget_memory`, `memory_stats`. With `--session`, it adds `create_session`, `push_session`, `session_context`, `delete_session`. Enable both with `distill mcp --memory --session`.
139
+
The base MCP server exposes `deduplicate_context` and `analyze_redundancy`. With `--memory`, it adds `store_memory`, `recall_memory`, `forget_memory`, `memory_expire`, `memory_supersede`, `memory_stats`. With `--session`, it adds `create_session`, `push_session`, `session_context`, `delete_session`. Enable both with `distill mcp --memory --session`.
126
140
127
141
### Can I use Distill with local models (Ollama, vLLM)?
128
142
129
-
The dedup pipeline itself doesn't call any LLM - it's pure math (cosine distance, clustering). The only external dependency is for embedding generation when you send text without pre-computed embeddings. Multi-provider embedding support (Ollama, Azure, Cohere, HuggingFace) is planned in [#33](https://github.com/Siddhant-K-code/distill/issues/33).
143
+
Yes. The dedup pipeline itself doesn't call any LLM - it's pure math (cosine distance, clustering). For embeddings, Distill supports OpenAI, Ollama, and Cohere via `--embedding-provider`:
144
+
145
+
```bash
146
+
# Use Ollama locally (no API key needed)
147
+
distill api --embedding-provider ollama --embedding-base-url http://localhost:11434
148
+
149
+
# Use Cohere
150
+
distill api --embedding-provider cohere
151
+
```
152
+
153
+
You can also send chunks with pre-computed embeddings to skip embedding generation entirely.
130
154
131
155
---
132
156
@@ -146,7 +170,7 @@ The agglomerative clustering is O(N²) for the distance matrix. For N=50, this i
146
170
147
171
### What if chunks don't have embeddings?
148
172
149
-
If you send text-only chunks to the API, Distill calls OpenAI's `text-embedding-3-small` to generate embeddings on the fly. Set `OPENAI_API_KEY` to enable this. If you send chunks with pre-computed embeddings (e.g., from your vector DB retrieval), no OpenAI call is needed.
173
+
If you send text-only chunks to the API, Distill generates embeddings on the fly using the configured provider (OpenAI by default, or Ollama/Cohere via `--embedding-provider`). If you send chunks with pre-computed embeddings (e.g., from your vector DB retrieval), no embedding call is needed.
150
174
151
175
---
152
176
@@ -197,9 +221,15 @@ Yes, MIT. The full pipeline, CLI, API server, MCP server, and all algorithms are
197
221
### What's on the roadmap?
198
222
199
223
**Shipped:**
200
-
-**Context Memory** - Persistent deduplicated memory across sessions with hierarchical decay ([#29](https://github.com/Siddhant-K-code/distill/issues/29))
201
-
-**Session Management** - Token-budgeted context windows with compression and eviction ([#31](https://github.com/Siddhant-K-code/distill/issues/31))
224
+
-**Context Memory** — persistent deduplicated memory with hierarchical decay ([#29](https://github.com/Siddhant-K-code/distill/issues/29))
225
+
-**Session Management** — token-budgeted context windows with compression and eviction ([#31](https://github.com/Siddhant-K-code/distill/issues/31))
0 commit comments