Skip to content

Commit 3287d85

Browse files
docs: add session and memory docs across all doc files
- CHANGELOG: add [Unreleased] section with session management - FAQ: add memory, session, and MCP tools Q&As; update roadmap status - examples/session_api.sh: full API usage example - mcp/README: add memory + session tool docs, config examples, integration patterns (session tracking, cross-session memory) - mcp/claude_desktop_config_full.example.json: config with all features Co-authored-by: Ona <no-reply@ona.com>
1 parent 6302608 commit 3287d85

5 files changed

Lines changed: 255 additions & 8 deletions

File tree

CHANGELOG.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,23 @@
22

33
All notable changes to Distill are documented here.
44

5+
## [Unreleased]
6+
7+
### Added
8+
9+
- **Session-based context window management** (`pkg/session`) — Token-budgeted context windows for long-running agent sessions. Entries are deduplicated on push, compressed through hierarchical levels (full text → summary → sentence → keywords), and evicted when the budget is exceeded. Lowest-importance entries are compressed first. ([#38](https://github.com/Siddhant-K-code/distill/pull/38), closes [#31](https://github.com/Siddhant-K-code/distill/issues/31))
10+
- **Session CLI**`distill session create/push/context/delete` commands. ([#38](https://github.com/Siddhant-K-code/distill/pull/38))
11+
- **Session HTTP API**`/v1/session/create`, `/push`, `/context`, `/delete`, `/get` endpoints. Opt-in via `--session` flag. ([#38](https://github.com/Siddhant-K-code/distill/pull/38))
12+
- **Session MCP tools**`create_session`, `push_session`, `session_context`, `delete_session` for Claude Desktop, Cursor, and Amp. Opt-in via `--session` flag. ([#38](https://github.com/Siddhant-K-code/distill/pull/38))
13+
14+
### Stats
15+
16+
- 9 files changed, 1,928 insertions, 6 deletions
17+
- 1 new package: `pkg/session`
18+
- 13 new tests
19+
20+
---
21+
522
## [v0.3.0] - 2026-02-23
623

724
Feature release adding persistent context memory, SSE streaming, OpenTelemetry tracing, and project documentation.

FAQ.md

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,18 @@ LLMs are non-deterministic. The same input can produce different compressed outp
2020

2121
---
2222

23+
### What is Context Memory?
24+
25+
Persistent memory that accumulates knowledge across agent sessions. Store context once, recall it later by semantic similarity + recency. Memories are deduplicated on write and compressed over time through hierarchical decay (full text → summary → keywords → evicted). Enable with `--memory` on the `api` or `mcp` commands.
26+
27+
### What are Sessions?
28+
29+
Token-budgeted context windows for long-running agent tasks. Push context incrementally as the agent works - Distill deduplicates entries, compresses aging ones, and evicts when the budget is exceeded. The `preserve_recent` setting keeps the N most recent entries at full fidelity. Enable with `--session` on the `api` or `mcp` commands.
30+
31+
### How is Context Memory different from Sessions?
32+
33+
Memory is cross-session: knowledge persists after a session ends and can be recalled in future sessions. Sessions are within-task: a bounded context window that tracks what the agent has seen during a single task, enforcing a token budget. Use memory for long-term knowledge, sessions for working context.
34+
2335
## Algorithms
2436

2537
### Why agglomerative clustering instead of K-Means?
@@ -108,6 +120,10 @@ Yes. The HTTP API is framework-agnostic. MCP works with any MCP-compatible clien
108120

109121
LangChain's `search_type="mmr"` applies MMR at the vector DB level - a single re-ranking step. Distill runs a multi-stage pipeline: cache lookup, agglomerative clustering (groups similar chunks), representative selection (picks the best from each group), compression (reduces token count), then MMR (diversity re-ranking). The clustering step is the key difference - it understands group structure, not just pairwise similarity.
110122

123+
### What MCP tools does Distill expose?
124+
125+
The base MCP server exposes `deduplicate_context` and `analyze_redundancy`. With `--memory`, it adds `store_memory`, `recall_memory`, `forget_memory`, `memory_stats`. With `--session`, it adds `create_session`, `push_session`, `session_context`, `delete_session`. Enable both with `distill mcp --memory --session`.
126+
111127
### Can I use Distill with local models (Ollama, vLLM)?
112128

113129
The dedup pipeline itself doesn't call any LLM - it's pure math (cosine distance, clustering). The only external dependency is for embedding generation when you send text without pre-computed embeddings. Multi-provider embedding support (Ollama, Azure, Cohere, HuggingFace) is planned in [#33](https://github.com/Siddhant-K-code/distill/issues/33).
@@ -180,8 +196,10 @@ Yes, AGPL-3.0. The full pipeline, CLI, API server, MCP server, and all algorithm
180196

181197
### What's on the roadmap?
182198

183-
Three pillars:
199+
**Shipped:**
200+
- **Context Memory** - Persistent deduplicated memory across sessions with hierarchical decay ([#29](https://github.com/Siddhant-K-code/distill/issues/29))
201+
- **Session Management** - Token-budgeted context windows with compression and eviction ([#31](https://github.com/Siddhant-K-code/distill/issues/31))
184202

185-
1. **Context Memory** - Persistent deduplicated memory across agent sessions with hierarchical decay ([#29](https://github.com/Siddhant-K-code/distill/issues/29), [#31](https://github.com/Siddhant-K-code/distill/issues/31))
186-
2. **Code Intelligence** - Dependency graphs, co-change patterns, blast radius analysis ([#30](https://github.com/Siddhant-K-code/distill/issues/30), [#32](https://github.com/Siddhant-K-code/distill/issues/32))
187-
3. **Platform** - Python SDK, multi-provider embeddings, batch API ([#5](https://github.com/Siddhant-K-code/distill/issues/5), [#33](https://github.com/Siddhant-K-code/distill/issues/33), [#11](https://github.com/Siddhant-K-code/distill/issues/11))
203+
**Upcoming:**
204+
1. **Code Intelligence** - Dependency graphs, co-change patterns, blast radius analysis ([#30](https://github.com/Siddhant-K-code/distill/issues/30), [#32](https://github.com/Siddhant-K-code/distill/issues/32))
205+
2. **Platform** - Python SDK, multi-provider embeddings, batch API ([#5](https://github.com/Siddhant-K-code/distill/issues/5), [#33](https://github.com/Siddhant-K-code/distill/issues/33), [#11](https://github.com/Siddhant-K-code/distill/issues/11))

examples/session_api.sh

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
#!/bin/bash
2+
# Example: Session-based context window management via the API
3+
#
4+
# Start the server (in another terminal):
5+
# distill api --port 8080 --session
6+
#
7+
# Sessions track context for long-running agent tasks with a token budget.
8+
# Entries are deduplicated on push, compressed as they age, and evicted
9+
# when the budget is exceeded.
10+
11+
BASE="http://localhost:8080"
12+
13+
echo "=== Create session ==="
14+
curl -s -X POST "$BASE/v1/session/create" \
15+
-H "Content-Type: application/json" \
16+
-d '{
17+
"session_id": "demo-task",
18+
"max_tokens": 50000
19+
}' | jq .
20+
21+
echo ""
22+
echo "=== Push context entries ==="
23+
curl -s -X POST "$BASE/v1/session/push" \
24+
-H "Content-Type: application/json" \
25+
-d '{
26+
"session_id": "demo-task",
27+
"entries": [
28+
{
29+
"role": "user",
30+
"content": "Fix the JWT validation bug in the auth service",
31+
"importance": 1.0
32+
},
33+
{
34+
"role": "tool",
35+
"content": "File: auth/jwt.go\n\npackage auth\n\nimport (\n\t\"crypto/rsa\"\n\t\"time\"\n)\n\nfunc ValidateToken(token string, key *rsa.PublicKey) error {\n\t// BUG: not checking expiry\n\treturn nil\n}",
36+
"source": "file_read",
37+
"importance": 0.8
38+
},
39+
{
40+
"role": "tool",
41+
"content": "File: auth/jwt_test.go\n\npackage auth\n\nimport \"testing\"\n\nfunc TestValidateToken(t *testing.T) {\n\t// No expiry test\n}",
42+
"source": "file_read",
43+
"importance": 0.6
44+
},
45+
{
46+
"role": "assistant",
47+
"content": "The ValidateToken function is missing expiry checks. I will add time.Now().After(claims.ExpiresAt) validation.",
48+
"importance": 0.9
49+
}
50+
]
51+
}' | jq .
52+
53+
echo ""
54+
echo "=== Read context window ==="
55+
curl -s -X POST "$BASE/v1/session/context" \
56+
-H "Content-Type: application/json" \
57+
-d '{
58+
"session_id": "demo-task"
59+
}' | jq .
60+
61+
echo ""
62+
echo "=== Read only tool entries ==="
63+
curl -s -X POST "$BASE/v1/session/context" \
64+
-H "Content-Type: application/json" \
65+
-d '{
66+
"session_id": "demo-task",
67+
"role": "tool"
68+
}' | jq .
69+
70+
echo ""
71+
echo "=== Get session metadata ==="
72+
curl -s "$BASE/v1/session/get?session_id=demo-task" | jq .
73+
74+
echo ""
75+
echo "=== Clean up ==="
76+
curl -s -X DELETE "$BASE/v1/session/delete" \
77+
-H "Content-Type: application/json" \
78+
-d '{"session_id": "demo-task"}' | jq .

mcp/README.md

Lines changed: 120 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,11 @@ It clusters semantically similar chunks, picks the best representative from each
2424
# Build
2525
go build -o distill .
2626

27-
# Start MCP server
27+
# Start MCP server (dedup only)
2828
./distill mcp
29+
30+
# With memory and sessions enabled
31+
./distill mcp --memory --session
2932
```
3033

3134
### Remote (HTTP) - Hosted deployment
@@ -34,6 +37,9 @@ go build -o distill .
3437
# Start HTTP server
3538
./distill mcp --transport http --port 8081
3639

40+
# With all features
41+
./distill mcp --transport http --port 8081 --memory --session
42+
3743
# Or deploy to Fly.io
3844
fly deploy -c fly.mcp.toml
3945
```
@@ -74,6 +80,78 @@ Query a vector database with automatic deduplication. Requires `--backend` flag.
7480

7581
Analyze chunks for redundancy without removing any. Use to understand overlap before deduplicating.
7682

83+
### `store_memory` (requires `--memory`)
84+
85+
Store context that should persist across sessions. Memories are deduplicated on write.
86+
87+
```json
88+
{
89+
"text": "Auth service uses JWT with RS256 signing",
90+
"tags": ["auth", "jwt"],
91+
"source": "code_review"
92+
}
93+
```
94+
95+
### `recall_memory` (requires `--memory`)
96+
97+
Recall relevant memories by semantic similarity + recency.
98+
99+
```json
100+
{
101+
"query": "How does authentication work?",
102+
"max_results": 5,
103+
"tags": ["auth"]
104+
}
105+
```
106+
107+
### `forget_memory` (requires `--memory`)
108+
109+
Remove memories by tag or age.
110+
111+
### `memory_stats` (requires `--memory`)
112+
113+
Get memory store statistics (total count, by decay level, by source).
114+
115+
### `create_session` (requires `--session`)
116+
117+
Create a token-budgeted context window for a task.
118+
119+
```json
120+
{
121+
"session_id": "fix-auth-bug",
122+
"max_tokens": 128000
123+
}
124+
```
125+
126+
### `push_session` (requires `--session`)
127+
128+
Push context entries to a session. Entries are deduplicated and the token budget is enforced via compression and eviction.
129+
130+
```json
131+
{
132+
"session_id": "fix-auth-bug",
133+
"content": "File: auth/jwt.go\n...",
134+
"role": "tool",
135+
"source": "file_read",
136+
"importance": 0.8
137+
}
138+
```
139+
140+
### `session_context` (requires `--session`)
141+
142+
Read the current context window. Returns entries in push order with compression levels and token counts.
143+
144+
```json
145+
{
146+
"session_id": "fix-auth-bug",
147+
"max_tokens": 50000
148+
}
149+
```
150+
151+
### `delete_session` (requires `--session`)
152+
153+
Delete a session and all its entries.
154+
77155
## Resources
78156

79157
### `distill://system-prompt`
@@ -102,7 +180,7 @@ Arguments:
102180

103181
Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
104182

105-
**Local (stdio):**
183+
**Local (stdio) - dedup only:**
106184
```json
107185
{
108186
"mcpServers": {
@@ -114,6 +192,21 @@ Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
114192
}
115193
```
116194

195+
**With memory and sessions:**
196+
```json
197+
{
198+
"mcpServers": {
199+
"distill": {
200+
"command": "/path/to/distill",
201+
"args": ["mcp", "--memory", "--session"],
202+
"env": {
203+
"OPENAI_API_KEY": "your-openai-key"
204+
}
205+
}
206+
}
207+
}
208+
```
209+
117210
**Remote (HTTP):**
118211
```json
119212
{
@@ -131,7 +224,7 @@ Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
131224
"mcpServers": {
132225
"distill": {
133226
"command": "/path/to/distill",
134-
"args": ["mcp", "--backend", "pinecone", "--index", "my-index"],
227+
"args": ["mcp", "--backend", "pinecone", "--index", "my-index", "--memory", "--session"],
135228
"env": {
136229
"PINECONE_API_KEY": "your-api-key",
137230
"OPENAI_API_KEY": "your-openai-key"
@@ -219,7 +312,30 @@ AI: [calls analyze_redundancy]
219312
AI: "Found 40% redundancy across 3 clusters. Want me to deduplicate?"
220313
```
221314

222-
### Pattern 4: Direct Vector DB Query
315+
### Pattern 4: Session-Based Context Tracking
316+
317+
Track context across a multi-step task:
318+
319+
```
320+
1. AI creates a session: create_session("fix-auth-bug", 128000)
321+
2. AI reads files: push_session(role="tool", content=file, source="file_read")
322+
3. AI reads tests: push_session(role="tool", content=tests, source="file_read")
323+
4. Budget exceeded → oldest low-importance entries compressed automatically
324+
5. AI reads context: session_context() → deduplicated, budget-aware window
325+
6. Task done: delete_session()
326+
```
327+
328+
### Pattern 5: Cross-Session Memory
329+
330+
Persist knowledge that should survive across sessions:
331+
332+
```
333+
1. AI discovers a pattern: store_memory("Auth uses JWT with RS256", tags=["auth"])
334+
2. Next session, different task: recall_memory("How does auth work?")
335+
3. AI gets relevant memories without re-reading files
336+
```
337+
338+
### Pattern 6: Direct Vector DB Query
223339

224340
If backend is configured, query with automatic deduplication:
225341

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"mcpServers": {
3+
"distill": {
4+
"command": "/path/to/distill",
5+
"args": [
6+
"mcp",
7+
"--memory",
8+
"--session",
9+
"--backend", "pinecone",
10+
"--index", "your-index-name"
11+
],
12+
"env": {
13+
"PINECONE_API_KEY": "your-pinecone-api-key",
14+
"OPENAI_API_KEY": "your-openai-api-key"
15+
}
16+
}
17+
}
18+
}

0 commit comments

Comments
 (0)