Skip to content

Commit 48a816c

Browse files
docs: update FAQ for v0.9.0 features (#94)
- Update description to reflect context intelligence layer positioning - Add FAQ entries for conflict detection, sensitivity classification, expiry/supersession - Update memory description with conflict and sensitivity details - Update MCP tools list with memory_expire and memory_supersede - Update embedding provider answers (Ollama/Cohere now shipped) - Update roadmap section with all shipped v0.9.0/v0.9.1 features Co-authored-by: Ona <no-reply@ona.com>
1 parent 940cadc commit 48a816c

1 file changed

Lines changed: 39 additions & 9 deletions

File tree

FAQ.md

Lines changed: 39 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
### What does Distill do?
66

7-
Distill is a post-retrieval processing layer for RAG pipelines. When you fetch chunks from a vector database, 30-40% are typically redundant - same information phrased differently. Distill clusters semantically similar chunks, picks the best representative from each cluster, compresses verbose content, and re-ranks for diversity. Total overhead is ~12ms. No LLM calls.
7+
Distill is a context intelligence layer for LLM agents. It gives agents persistent, deduplicated memory that survives across sessions, deduplicates semantically similar context chunks, compresses verbose content, and re-ranks for diversity. It also detects conflicting information, classifies sensitive content, and manages token-budgeted context windows. Total overhead is ~12ms. No LLM calls.
88

99
### Why not just fetch fewer results from the vector DB?
1010

@@ -22,7 +22,7 @@ LLMs are non-deterministic. The same input can produce different compressed outp
2222

2323
### What is Context Memory?
2424

25-
Persistent memory that accumulates knowledge across agent sessions. Store context once, recall it later by semantic similarity + recency. Memories are deduplicated on write and compressed over time through hierarchical decay (full text → summary → keywords → evicted). Enable with `--memory` on the `api` or `mcp` commands.
25+
Persistent memory that accumulates knowledge across agent sessions. Store context once, recall it later by semantic similarity + recency. Memories are deduplicated on write, compressed over time through hierarchical decay (full text → summary → keywords → evicted), and automatically classified for sensitivity (PII, credentials, internal IPs). On store, conflicting memories (cosine distance 0.15–0.35) are flagged. On recall, results can be boosted by tags and task context. Enable with `--memory` on the `api` or `mcp` commands.
2626

2727
### What are Sessions?
2828

@@ -32,6 +32,20 @@ Token-budgeted context windows for long-running agent tasks. Push context increm
3232

3333
Memory is cross-session: knowledge persists after a session ends and can be recalled in future sessions. Sessions are within-task: a bounded context window that tracks what the agent has seen during a single task, enforcing a token budget. Use memory for long-term knowledge, sessions for working context.
3434

35+
### How does conflict detection work?
36+
37+
When storing a memory, Distill checks existing entries by cosine distance. Entries below 0.15 are duplicates (skipped). Entries between 0.15 and 0.35 are flagged as conflicts — semantically related but different enough to be contradictory. The conflicts are returned in the store response so the agent can decide which version to keep, or supersede the old one.
38+
39+
### What is sensitivity classification?
40+
41+
Distill can automatically scan memory content for PII (emails, phone numbers, SSNs), credentials (API keys, tokens, passwords), and internal infrastructure (private IPs, internal domains). Enable with `auto_classify: true` on store. Recall results include `max_sensitivity` and a list of `sensitive_chunks` so agents can handle sensitive data appropriately.
42+
43+
### How do expiry and supersession work?
44+
45+
**Expire** soft-deletes a memory — it stays in the database but is excluded from recall by default. Useful for marking outdated information without losing it. **Supersede** links an old memory to its replacement — the old entry is expired and tagged with the new entry's ID. This preserves the audit trail while ensuring only current information is recalled.
46+
47+
---
48+
3549
## Algorithms
3650

3751
### Why agglomerative clustering instead of K-Means?
@@ -122,11 +136,21 @@ LangChain's `search_type="mmr"` applies MMR at the vector DB level - a single re
122136

123137
### What MCP tools does Distill expose?
124138

125-
The base MCP server exposes `deduplicate_context` and `analyze_redundancy`. With `--memory`, it adds `store_memory`, `recall_memory`, `forget_memory`, `memory_stats`. With `--session`, it adds `create_session`, `push_session`, `session_context`, `delete_session`. Enable both with `distill mcp --memory --session`.
139+
The base MCP server exposes `deduplicate_context` and `analyze_redundancy`. With `--memory`, it adds `store_memory`, `recall_memory`, `forget_memory`, `memory_expire`, `memory_supersede`, `memory_stats`. With `--session`, it adds `create_session`, `push_session`, `session_context`, `delete_session`. Enable both with `distill mcp --memory --session`.
126140

127141
### Can I use Distill with local models (Ollama, vLLM)?
128142

129-
The dedup pipeline itself doesn't call any LLM - it's pure math (cosine distance, clustering). The only external dependency is for embedding generation when you send text without pre-computed embeddings. Multi-provider embedding support (Ollama, Azure, Cohere, HuggingFace) is planned in [#33](https://github.com/Siddhant-K-code/distill/issues/33).
143+
Yes. The dedup pipeline itself doesn't call any LLM - it's pure math (cosine distance, clustering). For embeddings, Distill supports OpenAI, Ollama, and Cohere via `--embedding-provider`:
144+
145+
```bash
146+
# Use Ollama locally (no API key needed)
147+
distill api --embedding-provider ollama --embedding-base-url http://localhost:11434
148+
149+
# Use Cohere
150+
distill api --embedding-provider cohere
151+
```
152+
153+
You can also send chunks with pre-computed embeddings to skip embedding generation entirely.
130154

131155
---
132156

@@ -146,7 +170,7 @@ The agglomerative clustering is O(N²) for the distance matrix. For N=50, this i
146170

147171
### What if chunks don't have embeddings?
148172

149-
If you send text-only chunks to the API, Distill calls OpenAI's `text-embedding-3-small` to generate embeddings on the fly. Set `OPENAI_API_KEY` to enable this. If you send chunks with pre-computed embeddings (e.g., from your vector DB retrieval), no OpenAI call is needed.
173+
If you send text-only chunks to the API, Distill generates embeddings on the fly using the configured provider (OpenAI by default, or Ollama/Cohere via `--embedding-provider`). If you send chunks with pre-computed embeddings (e.g., from your vector DB retrieval), no embedding call is needed.
150174

151175
---
152176

@@ -197,9 +221,15 @@ Yes, MIT. The full pipeline, CLI, API server, MCP server, and all algorithms are
197221
### What's on the roadmap?
198222

199223
**Shipped:**
200-
- **Context Memory** - Persistent deduplicated memory across sessions with hierarchical decay ([#29](https://github.com/Siddhant-K-code/distill/issues/29))
201-
- **Session Management** - Token-budgeted context windows with compression and eviction ([#31](https://github.com/Siddhant-K-code/distill/issues/31))
224+
- **Context Memory** — persistent deduplicated memory with hierarchical decay ([#29](https://github.com/Siddhant-K-code/distill/issues/29))
225+
- **Session Management** — token-budgeted context windows with compression and eviction ([#31](https://github.com/Siddhant-K-code/distill/issues/31))
226+
- **Memory Intelligence (v0.9.0)** — conflict detection ([#77](https://github.com/Siddhant-K-code/distill/issues/77)), task-relevance ranking ([#78](https://github.com/Siddhant-K-code/distill/issues/78)), expiry/supersession ([#79](https://github.com/Siddhant-K-code/distill/issues/79)), sensitivity classification ([#82](https://github.com/Siddhant-K-code/distill/issues/82))
227+
- **Multi-provider embeddings** — OpenAI, Ollama, Cohere via `--embedding-provider` ([#25](https://github.com/Siddhant-K-code/distill/issues/25), [#33](https://github.com/Siddhant-K-code/distill/issues/33))
228+
- **OpenAPI spec & Swagger UI (v0.9.1)** — interactive docs at `/docs` ([#23](https://github.com/Siddhant-K-code/distill/issues/23))
229+
- **v2.0 Documentation** — guides, API reference, examples ([#8](https://github.com/Siddhant-K-code/distill/issues/8))
230+
- **Code Intelligence** — dependency graphs, blast radius, semantic commit analysis ([#30](https://github.com/Siddhant-K-code/distill/issues/30), [#32](https://github.com/Siddhant-K-code/distill/issues/32))
231+
- **Batch API** — async job queue with progress polling ([#11](https://github.com/Siddhant-K-code/distill/issues/11))
202232

203233
**Upcoming:**
204-
1. **Code Intelligence** - Dependency graphs, co-change patterns, blast radius analysis ([#30](https://github.com/Siddhant-K-code/distill/issues/30), [#32](https://github.com/Siddhant-K-code/distill/issues/32))
205-
2. **Platform** - Python SDK, multi-provider embeddings, batch API ([#5](https://github.com/Siddhant-K-code/distill/issues/5), [#33](https://github.com/Siddhant-K-code/distill/issues/33), [#11](https://github.com/Siddhant-K-code/distill/issues/11))
234+
1. **Python SDK** `pip install distill-ai` with LangChain/LlamaIndex integrations ([#5](https://github.com/Siddhant-K-code/distill/issues/5))
235+
2. **Postgres memory backend** — pgvector-backed memory store for production deployments ([#74](https://github.com/Siddhant-K-code/distill/issues/74))

0 commit comments

Comments
 (0)