Skip to content

Commit b978636

Browse files
feat: add persistent context memory store with write-time dedup (#37)
* feat: add persistent context memory store with write-time dedup Adds pkg/memory with SQLite-backed persistent storage for context that accumulates across agent sessions. Core features: - Write-time dedup via cosine distance on embeddings - Tag-based recall with relevance + recency ranking - Token budget support for recall - Hierarchical decay: full text -> summary -> keywords -> evicted - Background decay worker with configurable age thresholds Integration: - CLI: distill memory store/recall/forget/stats - API: POST /v1/memory/store, /recall, /forget, GET /stats - MCP: store_memory, recall_memory, forget_memory, memory_stats tools Uses modernc.org/sqlite (pure Go, no CGO) for zero-dependency local storage. Closes #29 Co-authored-by: Ona <no-reply@ona.com> * fix: handle all error returns to satisfy errcheck linter Co-authored-by: Ona <no-reply@ona.com> * fix: handle Close() error returns in tests Co-authored-by: Ona <no-reply@ona.com> * refactor: address code review findings for memory store - P1: Make touchMemories synchronous (no goroutine leak risk) - P2: Remove unused MaxMemories from config - P2: Use junction table (memory_tags) for exact tag matching - P3: Make memory store opt-in via --memory flag in api/mcp - P4: Replace local embeddingProvider with retriever.EmbeddingProvider - P4: Reuse pkg/compress extractive scorer in decay extractSummary - Fix scan-then-process pattern to avoid SQLite single-conn deadlocks Co-authored-by: Ona <no-reply@ona.com> * cleanup: address remaining review nits - Add TODO to findDuplicate noting O(n) full table scan scaling limit - Refactor Stats to scan-then-close each query (consistent with Recall) - Move extractSummary compressor to package-level var (avoid per-call alloc) - Add --memory-db flag to MCP command (was hardcoded) - Move PRAGMA foreign_keys to NewSQLiteStore alongside other PRAGMAs - Move isStopWord map to package-level var (avoid per-call alloc) - Remove trailing blank line in memory.go Co-authored-by: Ona <no-reply@ona.com> * docs: add context memory section to README - Add Context Memory section with CLI, API, and MCP usage examples - Add memory endpoints to API Endpoints table - Add memory command to CLI Commands list - Update architecture diagram: Memory Store is shipped, not planned - Update roadmap: mark Context Memory Store as shipped - Update intro blurb to reflect memory is available Co-authored-by: Ona <no-reply@ona.com> --------- Co-authored-by: Ona <no-reply@ona.com>
1 parent e6f58f3 commit b978636

13 files changed

Lines changed: 2086 additions & 8 deletions

File tree

README.md

Lines changed: 81 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
**Context intelligence layer for AI agents.**
1212

13-
Deduplicates, compresses, and manages context across sessions - so your agents produce reliable, deterministic outputs. Today: a dedup pipeline with ~12ms overhead. Next: persistent context memory, code change impact graphs, and session-aware context windows.
13+
Deduplicates, compresses, and manages context across sessions - so your agents produce reliable, deterministic outputs. Includes a dedup pipeline with ~12ms overhead and persistent context memory with write-time dedup and hierarchical decay.
1414

1515
Less redundant data. Lower costs. Faster responses. Deterministic results.
1616

@@ -201,12 +201,82 @@ Add to Claude Desktop (`~/Library/Application Support/Claude/claude_desktop_conf
201201

202202
See [mcp/README.md](mcp/README.md) for more configuration options.
203203

204+
## Context Memory
205+
206+
Persistent memory that accumulates knowledge across agent sessions. Memories are deduplicated on write, ranked by relevance + recency on recall, and compressed over time through hierarchical decay.
207+
208+
Enable with the `--memory` flag on `api` or `mcp` commands.
209+
210+
### CLI
211+
212+
```bash
213+
# Store a memory
214+
distill memory store --text "Auth uses JWT with RS256 signing" --tags auth --source docs
215+
216+
# Recall relevant memories
217+
distill memory recall --query "How does authentication work?" --max-results 5
218+
219+
# Remove outdated memories
220+
distill memory forget --tags deprecated
221+
222+
# View statistics
223+
distill memory stats
224+
```
225+
226+
### API
227+
228+
```bash
229+
# Start API with memory enabled
230+
distill api --port 8080 --memory
231+
232+
# Store
233+
curl -X POST http://localhost:8080/v1/memory/store \
234+
-H "Content-Type: application/json" \
235+
-d '{
236+
"session_id": "session-1",
237+
"entries": [{"text": "Auth uses JWT with RS256", "tags": ["auth"], "source": "docs"}]
238+
}'
239+
240+
# Recall
241+
curl -X POST http://localhost:8080/v1/memory/recall \
242+
-H "Content-Type: application/json" \
243+
-d '{"query": "How does auth work?", "max_results": 5}'
244+
```
245+
246+
### MCP
247+
248+
Memory tools are available in Claude Desktop, Cursor, and other MCP clients when `--memory` is enabled:
249+
250+
```bash
251+
distill mcp --memory
252+
```
253+
254+
Tools exposed: `store_memory`, `recall_memory`, `forget_memory`, `memory_stats`.
255+
256+
### How Decay Works
257+
258+
Memories compress over time based on access patterns:
259+
260+
```
261+
Full text → Summary (~20%) → Keywords (~5%) → Evicted
262+
(24h) (7 days) (30 days)
263+
```
264+
265+
Accessing a memory resets its decay clock. Configure ages via `distill.yaml`:
266+
267+
```yaml
268+
memory:
269+
db_path: distill-memory.db
270+
dedup_threshold: 0.15
271+
```
272+
204273
## CLI Commands
205274
206275
```bash
207276
distill api # Start standalone API server
208277
distill serve # Start server with vector DB connection
209278
distill mcp # Start MCP server for AI assistants
279+
distill memory # Store, recall, and manage persistent context memories
210280
distill analyze # Analyze a file for duplicates
211281
distill sync # Upload vectors to Pinecone with dedup
212282
distill query # Test a query from command line
@@ -220,6 +290,10 @@ distill config # Manage configuration files
220290
| POST | `/v1/dedupe` | Deduplicate chunks |
221291
| POST | `/v1/dedupe/stream` | SSE streaming dedup with per-stage progress |
222292
| POST | `/v1/retrieve` | Query vector DB with dedup (requires backend) |
293+
| POST | `/v1/memory/store` | Store memories with write-time dedup (requires `--memory`) |
294+
| POST | `/v1/memory/recall` | Recall memories by relevance + recency (requires `--memory`) |
295+
| POST | `/v1/memory/forget` | Remove memories by ID, tag, or age (requires `--memory`) |
296+
| GET | `/v1/memory/stats` | Memory store statistics (requires `--memory`) |
223297
| GET | `/health` | Health check |
224298
| GET | `/metrics` | Prometheus metrics |
225299

@@ -489,10 +563,10 @@ KV cache for repeated context patterns (system prompts, tool definitions, boiler
489563
│ └─────────┘ └─────────┘ └─────────┘ └──────────┘ └─────────┘ │
490564
│ <1ms 6ms <1ms 2ms 3ms │
491565
│ │
492-
│ Context Intelligence (planned)
566+
│ Context Intelligence
493567
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │
494568
│ │ Memory Store │ │ Impact Graph │ │ Session Context Windows │ │
495-
│ │ (#29) │ │ (#30) │ │ (#31) │ │
569+
│ │ (shipped) │ │ (#30) │ │ (#31) │ │
496570
│ └──────────────┘ └──────────────┘ └──────────────────────────┘ │
497571
│ │
498572
│ ┌──────────────────────────────────────────────────────────────┐ │
@@ -527,10 +601,10 @@ Distill is evolving from a dedup utility into a context intelligence layer. Here
527601
528602
### Context Memory
529603
530-
| Feature | Issue | Description |
531-
|---------|-------|-------------|
532-
| **Context Memory Store** | [#29](https://github.com/Siddhant-K-code/distill/issues/29) | Persistent, deduplicated memory across sessions. Write-time dedup, hierarchical decay (full text -> summary -> keywords -> evicted), token-budgeted recall. |
533-
| **Session Management** | [#31](https://github.com/Siddhant-K-code/distill/issues/31) | Stateful context windows for long-running agents. Push context incrementally, Distill keeps it deduplicated and within budget. |
604+
| Feature | Issue | Status | Description |
605+
|---------|-------|--------|-------------|
606+
| **Context Memory Store** | [#29](https://github.com/Siddhant-K-code/distill/issues/29) | Shipped | Persistent, deduplicated memory across sessions. Write-time dedup, hierarchical decay, token-budgeted recall. See [Context Memory](#context-memory). |
607+
| **Session Management** | [#31](https://github.com/Siddhant-K-code/distill/issues/31) | Planned | Stateful context windows for long-running agents. Push context incrementally, Distill keeps it deduplicated and within budget. |
534608
535609
### Code Intelligence
536610

cmd/api.go

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ func init() {
4545
apiCmd.Flags().String("openai-key", "", "OpenAI API key for embeddings (or use OPENAI_API_KEY)")
4646
apiCmd.Flags().String("embedding-model", "text-embedding-3-small", "OpenAI embedding model")
4747
apiCmd.Flags().String("api-keys", "", "Comma-separated list of valid API keys (or use DISTILL_API_KEYS)")
48+
apiCmd.Flags().Bool("memory", false, "Enable persistent memory store")
4849

4950
// Bind to viper for config file support
5051
_ = viper.BindPFlag("server.port", apiCmd.Flags().Lookup("port"))
@@ -177,6 +178,27 @@ func runAPI(cmd *cobra.Command, args []string) error {
177178
mux := http.NewServeMux()
178179
mux.HandleFunc("/v1/dedupe", m.Middleware("/v1/dedupe", server.handleDedupe))
179180
mux.HandleFunc("/v1/dedupe/stream", m.Middleware("/v1/dedupe/stream", server.handleDedupeStream))
181+
182+
// Setup memory store (opt-in)
183+
enableMemory, _ := cmd.Flags().GetBool("memory")
184+
if enableMemory {
185+
memDBPath := viper.GetString("memory.db_path")
186+
if memDBPath == "" {
187+
memDBPath = "distill-memory.db"
188+
}
189+
memThreshold := viper.GetFloat64("memory.dedup_threshold")
190+
if memThreshold == 0 {
191+
memThreshold = 0.15
192+
}
193+
memStore, err := memoryStoreFromConfig(memDBPath, memThreshold)
194+
if err != nil {
195+
return fmt.Errorf("failed to create memory store: %w", err)
196+
}
197+
defer func() { _ = memStore.Close() }()
198+
199+
memAPI := &MemoryAPI{store: memStore, embedder: embedder}
200+
memAPI.RegisterMemoryRoutes(mux, m.Middleware)
201+
}
180202
mux.HandleFunc("/health", server.handleHealth)
181203
mux.HandleFunc("/metrics", func(w http.ResponseWriter, r *http.Request) {
182204
m.Handler().ServeHTTP(w, r)
@@ -218,6 +240,7 @@ func runAPI(cmd *cobra.Command, args []string) error {
218240
fmt.Printf("Distill API server starting on %s\n", addr)
219241
fmt.Printf(" Embeddings: %v\n", embedder != nil)
220242
fmt.Printf(" Auth: %v (%d keys)\n", server.hasAuth, len(validKeys))
243+
fmt.Printf(" Memory: %v\n", enableMemory)
221244
fmt.Println()
222245
fmt.Println("Endpoints:")
223246
fmt.Printf(" POST http://%s/v1/dedupe\n", addr)

cmd/api_memory.go

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
package cmd
2+
3+
import (
4+
"context"
5+
"encoding/json"
6+
"fmt"
7+
"net/http"
8+
"time"
9+
10+
"github.com/Siddhant-K-code/distill/pkg/memory"
11+
"github.com/Siddhant-K-code/distill/pkg/retriever"
12+
)
13+
14+
// MemoryAPI handles memory-related HTTP endpoints.
15+
type MemoryAPI struct {
16+
store *memory.SQLiteStore
17+
embedder retriever.EmbeddingProvider
18+
}
19+
20+
// RegisterMemoryRoutes adds memory endpoints to the given mux.
21+
func (m *MemoryAPI) RegisterMemoryRoutes(mux *http.ServeMux, mw func(string, http.HandlerFunc) http.HandlerFunc) {
22+
mux.HandleFunc("/v1/memory/store", mw("/v1/memory/store", m.handleStore))
23+
mux.HandleFunc("/v1/memory/recall", mw("/v1/memory/recall", m.handleRecall))
24+
mux.HandleFunc("/v1/memory/forget", mw("/v1/memory/forget", m.handleForget))
25+
mux.HandleFunc("/v1/memory/stats", mw("/v1/memory/stats", m.handleStats))
26+
}
27+
28+
func (m *MemoryAPI) handleStore(w http.ResponseWriter, r *http.Request) {
29+
if r.Method != http.MethodPost {
30+
http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
31+
return
32+
}
33+
34+
var req memory.StoreRequest
35+
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
36+
writeJSONError(w, "invalid request body", http.StatusBadRequest)
37+
return
38+
}
39+
40+
// Generate embeddings for entries that don't have them
41+
if m.embedder != nil {
42+
var textsToEmbed []string
43+
var indices []int
44+
for i, e := range req.Entries {
45+
if len(e.Embedding) == 0 && e.Text != "" {
46+
textsToEmbed = append(textsToEmbed, e.Text)
47+
indices = append(indices, i)
48+
}
49+
}
50+
if len(textsToEmbed) > 0 {
51+
ctx, cancel := context.WithTimeout(r.Context(), 30*time.Second)
52+
defer cancel()
53+
embeddings, err := m.embedder.EmbedBatch(ctx, textsToEmbed)
54+
if err != nil {
55+
writeJSONError(w, fmt.Sprintf("embedding error: %v", err), http.StatusInternalServerError)
56+
return
57+
}
58+
for i, idx := range indices {
59+
req.Entries[idx].Embedding = embeddings[i]
60+
}
61+
}
62+
}
63+
64+
result, err := m.store.Store(r.Context(), req)
65+
if err != nil {
66+
writeJSONError(w, err.Error(), http.StatusInternalServerError)
67+
return
68+
}
69+
70+
w.Header().Set("Content-Type", "application/json")
71+
_ = json.NewEncoder(w).Encode(result)
72+
}
73+
74+
func (m *MemoryAPI) handleRecall(w http.ResponseWriter, r *http.Request) {
75+
if r.Method != http.MethodPost {
76+
http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
77+
return
78+
}
79+
80+
var req memory.RecallRequest
81+
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
82+
writeJSONError(w, "invalid request body", http.StatusBadRequest)
83+
return
84+
}
85+
86+
if req.Query == "" && len(req.QueryEmbedding) == 0 {
87+
writeJSONError(w, "query or query_embedding is required", http.StatusBadRequest)
88+
return
89+
}
90+
91+
// Generate query embedding if not provided
92+
if len(req.QueryEmbedding) == 0 && m.embedder != nil && req.Query != "" {
93+
ctx, cancel := context.WithTimeout(r.Context(), 10*time.Second)
94+
defer cancel()
95+
emb, err := m.embedder.Embed(ctx, req.Query)
96+
if err != nil {
97+
writeJSONError(w, fmt.Sprintf("embedding error: %v", err), http.StatusInternalServerError)
98+
return
99+
}
100+
req.QueryEmbedding = emb
101+
}
102+
103+
result, err := m.store.Recall(r.Context(), req)
104+
if err != nil {
105+
writeJSONError(w, err.Error(), http.StatusInternalServerError)
106+
return
107+
}
108+
109+
w.Header().Set("Content-Type", "application/json")
110+
_ = json.NewEncoder(w).Encode(result)
111+
}
112+
113+
func (m *MemoryAPI) handleForget(w http.ResponseWriter, r *http.Request) {
114+
if r.Method != http.MethodDelete && r.Method != http.MethodPost {
115+
http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
116+
return
117+
}
118+
119+
var req memory.ForgetRequest
120+
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
121+
writeJSONError(w, "invalid request body", http.StatusBadRequest)
122+
return
123+
}
124+
125+
result, err := m.store.Forget(r.Context(), req)
126+
if err != nil {
127+
writeJSONError(w, err.Error(), http.StatusInternalServerError)
128+
return
129+
}
130+
131+
w.Header().Set("Content-Type", "application/json")
132+
_ = json.NewEncoder(w).Encode(result)
133+
}
134+
135+
func (m *MemoryAPI) handleStats(w http.ResponseWriter, r *http.Request) {
136+
if r.Method != http.MethodGet {
137+
http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
138+
return
139+
}
140+
141+
stats, err := m.store.Stats(r.Context())
142+
if err != nil {
143+
writeJSONError(w, err.Error(), http.StatusInternalServerError)
144+
return
145+
}
146+
147+
w.Header().Set("Content-Type", "application/json")
148+
_ = json.NewEncoder(w).Encode(stats)
149+
}
150+
151+
func writeJSONError(w http.ResponseWriter, msg string, code int) {
152+
w.Header().Set("Content-Type", "application/json")
153+
w.WriteHeader(code)
154+
_ = json.NewEncoder(w).Encode(map[string]string{"error": msg})
155+
}

0 commit comments

Comments
 (0)