All notable changes to Distill are documented here.
- Multi-provider embedding support — Ollama and Cohere alongside OpenAI;
--embedding-providerflag on all commands (#83) - Memory expiry and supersession —
expireandsupersedeendpoints; TTL viaexpires_at;include_expiredon recall (#84) - Sensitivity tagging — pattern-based auto-classification for PII, credentials, and internal references;
max_sensitivityin recall response (#85) - Conflict detection on write — semantically similar but non-duplicate entries flagged in store response (#88)
- Task-relevance ranking —
boost_tags,task_context, andmin_relevanceon recall (#89) - MCP tools —
memory_expireandmemory_supersede
- README reframed around memory reliability (#86)
- Staticcheck QF1003 lint violation (#87)
- Session-based context window management (
pkg/session) — Token-budgeted context windows for long-running agent sessions. Entries are deduplicated on push, compressed through hierarchical levels (full text → summary → sentence → keywords), and evicted when the budget is exceeded. Lowest-importance entries are compressed first. (#38, closes #31) - Session CLI —
distill session create/push/context/deletecommands. (#38) - Session HTTP API —
/v1/session/create,/push,/context,/delete,/getendpoints. Opt-in via--sessionflag. (#38) - Session MCP tools —
create_session,push_session,session_context,delete_sessionfor Claude Desktop, Cursor, and Amp. Opt-in via--sessionflag. (#38)
- 9 files changed, 1,928 insertions, 6 deletions
- 1 new package:
pkg/session - 13 new tests
Feature release adding persistent context memory, SSE streaming, OpenTelemetry tracing, and project documentation.
- Persistent context memory store (
pkg/memory) — SQLite-backed memory that persists across agent sessions. Write-time deduplication via cosine similarity, recall ranked by(1-w)*similarity + w*recency, tag filtering via junction table, and token-budgeted results. Opt-in via--memoryflag onapiandmcpcommands. (#37, closes #29) - Hierarchical decay worker — Background compression of aging memories: full text → summary (extractive, ~20%) → keywords (~5%) → evicted. Configurable ages via
distill.yaml. Accessing a memory resets its decay clock. (#37) - Memory CLI —
distill memory store/recall/forget/statscommands for direct memory management. (#37) - Memory HTTP API —
POST /v1/memory/store,/recall,/forget,GET /statsendpoints. (#37) - Memory MCP tools —
store_memory,recall_memory,forget_memory,memory_statstools for Claude Desktop, Cursor, and Amp. (#37) - SSE streaming dedup (
pkg/sse) —POST /v1/dedupe/streamendpoint with per-stage progress events (embedding, clustering, selection, MMR). (#22) - OpenTelemetry tracing (
pkg/telemetry) — Distributed tracing with OTLP and stdout exporters. Each pipeline stage instrumented as a separate span. W3C Trace Context propagation. (#21) - FAQ (
FAQ.md) — 20 Q&As covering algorithms, integrations, deployment, and cost. (#36)
- README — Added Context Memory section with CLI/API/MCP examples, updated architecture diagram (Memory Store: shipped), updated roadmap status, expanded API endpoints table. (#37, #34)
- Added
modernc.org/sqlite— Pure Go SQLite driver (no CGO required)
- 21 files changed, 3,617 insertions, 73 deletions
- 3 new packages:
pkg/memory,pkg/sse,pkg/telemetry - 11 new tests for memory store
Major release adding four new modules: semantic compression, KV caching, Prometheus observability, and YAML configuration.
- Prometheus metrics endpoint (
pkg/metrics) —/metricsendpoint on bothapiandservecommands. Tracks request rate, latency percentiles, chunks processed, reduction ratio, active requests, and clusters formed. Includes HTTP middleware for automatic instrumentation. (#19) - Grafana dashboard (
grafana/dashboard.json) — 10-panel dashboard template covering request rate, error rate, P99 latency, latency percentiles, chunks processed, reduction ratio, clusters formed, and status code breakdown. (#19) - Configuration file support (
pkg/config) —distill.yamlconfig file with schema validation and${VAR:-default}environment variable interpolation. New CLI commands:distill config initanddistill config validate. Config priority: CLI flags > env vars > config file > defaults. (#18) - Semantic compression module (
pkg/compress) — Three compression strategies: extractive (sentence scoring), placeholder (JSON/XML/table summarization), and pruner (filler phrase removal). Chainable viacompress.Pipeline. (#13) - KV cache for repeated patterns (
pkg/cache) — In-memory LRU cache with TTL support for system prompts, tool definitions, and other repeated context. IncludesPatternDetectorfor automatic identification of cacheable content and a Redis interface for distributed deployments. (#17) - GitHub Sponsors funding configuration (5b11d6d)
- Architecture diagram updated to reflect the full pipeline: Cache check → Cluster dedup → Select → Compress → MMR re-rank
- README expanded with pipeline module docs (compression, cache), monitoring section, config file usage, and updated integrations table (#15)
- Viper config loading now searches for
distill.yaml(previously.distill), usesDISTILL_env prefix with key replacer (#18) serve.go—/metricsendpoint now serves Prometheus format instead of JSON config dump (#19)
- Docker workflow permissions — Added attestations and id-token write permissions for ghcr.io push (#14)
- 24 files changed, 3,572 insertions, 55 deletions
- 4 new packages:
pkg/compress,pkg/cache,pkg/config,pkg/metrics - 46 new tests across all packages
- Go Report Card badge and formatting
- gofmt simplifications applied
- Documentation em-dash replacements
- Hosted API link in README
Initial release.
- Core deduplication pipeline: agglomerative clustering + MMR re-ranking
- Standalone API server (
distill api) - Vector DB server (
distill serve) with Pinecone and Qdrant backends - MCP server for AI assistants (
distill mcp) - Bulk sync and analyze commands
- Docker image, Fly.io and Render deployment configs
- GoReleaser cross-platform builds