Languages: English · 简体中文 · Español · 日本語 · Русский
Qdrant-backed dual-memory for AI coding agents
Give Claude Code, Cursor, and OpenCode persistent semantic + structural memory across every project.
👋 Built by Dzmitry Sukhau — AI-native Solution / Software Architect / CTO
Available for consulting on AI products, integrating AI into existing products, and business-process automation.
If you're shipping LLM features, evaluating retrieval pipelines, hardening agentic systems, or building an AI-first product from scratch — let's talk.
supamem is a single-binary CLI that wires up a production-grade memory layer for any AI coding
assistant. Drop it into a fresh repo, run supamem init, and your agents instantly gain:
- 🔍 Semantic search over project notes, ADRs, decisions, and past conversations (hybrid sparse+dense retrieval)
- 🤖 MCP server that any compatible client (Claude Code, Cursor, OpenCode) can talk to
- 🪝 Per-client hooks that auto-load relevant memory at session start and on file edits
- 📊 Welford usage stats so you can see what memory is actually being recalled
- 🧪 Eval harness with a 33-query golden corpus to detect retrieval regressions
Battle-tested inside SoftChat (Phases 80.1–80.5) before being extracted into a standalone package every team can adopt.
The problem: Coding agents have no memory between sessions. Every time you open a new conversation in Claude Code / Cursor / OpenCode, the model has zero context about your codebase, past decisions, ADRs, known issues, or conventions. So either:
- You re-paste 5–15 KB of context at the start of every session (slow, error-prone, costly), or
- You let the agent flounder — it grep-walks the repo, asks redundant questions, forgets last week's decisions, and rediscovers the same gotchas you already documented six months ago.
The fix: A persistent semantic + structural memory layer that automatically retrieves the right 1–2 KB of context for the current prompt — no manual pasting, no re-explaining, no context blow-out.
Phase 80.1 bench (33 labeled goldens, real Claude Code sessions): −78.5% tokens vs naive whole-doc retrieval at the same recall, p95 73 ms end-to-end.
The full evaluation is the same one we ran inside SoftChat to lock the production pipeline. Methodology: 33 representative dev queries → 4 retrieval arms compared (baseline_union, tuned_current, tuned_hybrid, mem0_vector) → token count + recall CI + latency measured per arm.
Numbers below are per typical 30-turn Claude Code session assuming a real codebase with ~50 ADRs / insights / rules (≈ what SoftChat ships). YMMV — but the ratio between arms holds.
| Approach | Tokens/turn | Tokens/30-turn session | Notes |
|---|---|---|---|
| ❌ No memory layer | ≈ 0 auto-injected, but you paste context manually | 30,000–80,000 (manual paste, repeated) | You spend cognitive load on copying instead of building |
| ~5,800 / turn | ~174,000 | Bloated, recalls big files when you only needed a paragraph | |
✅ supamem tuned_hybrid |
~1,250 / turn | ~37,500 | Same recall, −78.5% tokens vs naive RAG |
Anthropic API list pricing (Mar 2026): Sonnet 4.6 = $3 / Mtok input · Opus 4.7 = $15 / Mtok input.
| Model | Tokens saved/session vs naive RAG | Cost saved/session | Monthly (110 sessions) |
|---|---|---|---|
| Sonnet 4.6 | 136,500 | $0.41 | ~$45/dev |
| Opus 4.7 | 136,500 | $2.05 | ~$225/dev |
A 10-engineer team running Opus saves ~$2,250/month on input tokens alone — without counting the cost of slower iteration, lost decisions, and time spent re-pasting context. Output token savings (less hallucination, fewer back-and-forth turns) compound on top.
| No memory | Naive RAG | mem0 / atomic facts | supamem (tuned_hybrid) | |
|---|---|---|---|---|
| Auto-inject on session start | ❌ | ✅ | ✅ | |
| Hybrid sparse+dense retrieval | ❌ | ❌ | ❌ | ✅ |
| Code-identifier preservation | ❌ | ✅ | ❌ (drops names) | ✅ |
| Locked schema + golden eval | ❌ | ❌ | ❌ | ✅ |
| Multi-client (Claude/Cursor/OpenCode) | ❌ | ❌ | ✅ | |
| p95 latency | n/a | ~120 ms | ~80 ms | 73 ms |
| Token bloat | High (manual) | Highest | Low but lossy | Lowest with full recall |
Why hybrid? BM25 catches exact identifiers (ChatService.generate, env-var names,
file paths) that dense embeddings smear. Dense catches semantic intent ("how do we
handle billing webhooks?") that BM25 misses. RRF fusion combines both rankings so you
get the best of each.
Why not mem0? mem0's atomic-fact extraction loses code identifiers — recall on the 33-query bench was 0.015 (effectively zero). Great for personal CRM-style memory, not for code-aware retrieval.
# 1. Install (uv is the fastest path)
uv tool install supamem
# 2. Start Qdrant (one-time, ~30s)
docker run -d -p 6333:6333 -p 6334:6334 -v $HOME/.qdrant:/qdrant/storage qdrant/qdrant:latest
# 3. Bootstrap your project
cd your-project
supamem init
# 4. Wire it into your AI client
supamem install --client claude-code # or cursor, opencode
# 5. Confirm everything is healthy
supamem doctorThat's it. Open Claude Code (or your preferred client) inside the project — the memory tool is already on the menu. ✨
Run supamem live in a side terminal to watch every retrieval call as it happens — perfect alongside Claude Code / Cursor / OpenCode for instant visibility into the silent PreToolUse-hook injections (which save tokens by NOT showing UI).
The SessionStart banner (v0.1.4+) also lands a one-line status in your AI client at session open: 🧠 supamem v0.1.4 · <collection> · <N> chunks · audit <path> — auto-detects Claude Code / Cursor / OpenCode via env vars.
🎬 Interactive demo:
supamem-live.cast— drop into asciinema.org/player or run locally withasciinema play docs/media/supamem-live.cast.
| Feature | Description |
|---|---|
| 🔍 Hybrid retrieval | Tuned sparse (BM25) + dense (MiniLM) fusion, locked schema D-25 |
| 📚 Markdown chunker | Header-aware, 200-token chunks with 250-token soft max (T-1) |
| 🤖 MCP server | stdio (default) and http transports, official mcp SDK |
| 🪝 Multi-client hooks | Claude Code session-start, OpenCode session-start, Cursor MDC |
| 🧰 One-command install | Atomic config patching with auto-backup and rollback |
🩺 supamem doctor |
Probe Qdrant, resolve config chain, surface version drift |
👀 supamem live |
Rich-Live terminal dashboard tailing the audit JSONL — real-time visibility into retrieval calls (v0.1.4+) |
| 🎬 SessionStart banner | One-line cross-client banner injected at session open (Claude Code / Cursor / OpenCode), v0.1.4+ |
| 📊 Welford counters | Track recall rate, latency, query volume per project |
| 🧪 Eval harness | 33-query golden corpus + regression detection |
| 🔁 Brownfield migration | Detect existing dev_memory and migrate non-destructively |
| 🎨 Stylish CLI | Rich-powered spinners, panels, and color so you always see progress |
You only really need two things: Python 3.12+ and Qdrant. Everything else is optional.
🐍 Python 3.12+ · click to expand install commands
# macOS (Homebrew)
brew install python@3.12
# Linux (Ubuntu/Debian)
sudo apt install python3.12 python3.12-venv
# Windows (PowerShell)
winget install Python.Python.3.12We strongly recommend installing uv — the fastest Python package manager:
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"🗄️ Qdrant 1.10+ · vector database (required)
The simplest path is Docker:
docker run -d --name qdrant \
-p 6333:6333 -p 6334:6334 \
-v $HOME/.qdrant:/qdrant/storage \
qdrant/qdrant:latestOr with docker compose:
services:
qdrant:
image: qdrant/qdrant:latest
ports: ["6333:6333", "6334:6334"]
volumes: ["./qdrant_data:/qdrant/storage"]
restart: unless-stoppedDon't have Docker? Run a managed cluster on Qdrant Cloud (free tier
available) and point supamem at the URL via supamem init.
🤖 An MCP-compatible client · pick at least one
| Client | Install | Notes |
|---|---|---|
| Claude Code | npm install -g @anthropic-ai/claude-code |
First-class MCP support |
| Cursor | Download from cursor.com | Uses MDC rules + MCP |
| OpenCode | curl -fsSL https://opencode.ai/install | bash |
Open-source TUI, MCP native |
# Recommended: uv (fastest, isolated)
uv tool install supamem
# Alternative: pipx (also isolated)
pipx install supamem
# Plain pip (in a venv)
pip install supamemVerify:
supamem --versionYou should see a colorful banner and the credit line. 🎨
Latest:
v0.1.4is published on PyPI. Released via Trusted Publisher OIDC — every wheel is provenance-attested.
| Command | Purpose |
|---|---|
supamem init |
Greenfield bootstrap — probes Qdrant, creates collection, writes .supamem/config.toml |
supamem install --client <name> |
Patch a client config (claude-code, cursor, opencode) — atomic with backup. Defaults to --scope project (per-workspace files); pass --scope user for legacy global behavior. Pass --enforce-search (claude-code only) to wire the opt-in edit-gate hook. |
supamem repair |
🩹 Migrate from legacy global install to per-workspace files. Strips stale mcpServers.supamem from globals and re-installs at project scope from the current cwd. Idempotent. |
supamem index |
Embed dev memories into Qdrant using the locked tuned-hybrid pipeline (D-25) |
supamem mcp-server |
Run the MCP server (--transport stdio default; --transport http for HTTP) |
supamem hook <client> |
Per-client session/edit hooks (called by the client itself) |
supamem doctor |
🩺 Probe Qdrant, print resolved config chain, report version drift |
supamem stats |
Welford schema-v2 usage counters from .supamem/state/ |
supamem live |
👀 Live dashboard tailing the audit JSONL — pipe-safe (plain JSONL when not a TTY); handles rotation, resize, Ctrl-C |
supamem migrate |
Brownfield migration from a pre-existing dev_memory collection |
supamem eval |
Run the regression harness against the bundled 33-query golden corpus |
supamem uninstall --client <name> |
Reverse supamem install cleanly. Strips supamem from BOTH project and user scopes. |
| Var | Purpose |
|---|---|
SUPAMEM_PROJECT_ROOT |
Absolute path to the workspace. Honored first by mcp-server for project resolution; injected automatically by supamem install --scope project so MCP hosts that launch the subprocess from the wrong cwd still resolve the right collection. |
SUPAMEM_CONFIG |
Explicit TOML path overriding all discovery. Highest precedence. |
SUPAMEM_GATE_DISABLE=1 |
Bypass the opt-in claude-code edit-gate for the current session (--enforce-search users only). |
SUPAMEM_ADVISORY_DISABLE=1 |
Suppress the Cursor beforeSubmitPrompt advisory hook. |
SUPAMEM_NO_UPDATE_CHECK=1, NO_UPDATE_NOTIFIER=1, CI=1 |
Suppress the GitHub Releases probe. |
SUPAMEM_BANNER_DISABLE=1 |
Suppress the SessionStart one-line banner entirely (no context injection, no user-visible status). |
SUPAMEM_BANNER_QUIET=1 |
Suppress only the user-visible terminal status line; keep injecting the banner into Claude Code's additionalContext for the model. Use this when you want supamem context loaded but no per-session SessionStart:supamem says: … row in your terminal. |
Every supported client emits a one-line status at session open:
🧠 supamem ✓ v0.2.0 · supamem-myproject · 412 chunks · audit /home/me/.cache/supamem/audit.jsonl
^── health flag (✓ healthy / ⚠ misconfigured or qdrant unreachable)
When a newer release is locally cached by the background update probe, an
update v0.X.Y available segment is appended. Healing is never automatic —
the banner only signals; run supamem repair to act.
Every long-running command shows a live spinner with elapsed time so you always know it's
working. Use --help on any subcommand for details.
supamem can index your Claude Code session history as Q+A drawer chunks alongside your
project's Markdown corpus, surfacing past decisions and tool-use traces in dual_memory_search.
Default-OFF — opt in with --transcripts.
# Index Claude Code transcripts from the default location (~/.claude/projects/)
supamem index --transcripts
# Or point at a specific directory
supamem index --transcripts /path/to/sessions/
# Skip the regular project corpus and only index transcripts
supamem index --transcripts --transcripts-only
# Limit to recent sessions (default: 180 days; --since 0 disables the filter)
supamem index --transcripts --since 30dConfigure under [supamem.transcript] in .supamem/config.toml:
[supamem.transcript]
default_root = "~/.claude/projects/"
since_days = 180
tool_payload_max_chars = 2000
chunk_soft_max_tokens = 600
include_paths_glob = []
exclude_paths_glob = [] # exclude sensitive sessions, e.g. ["**/banking-*.jsonl"]⚠ Transcripts may contain secrets. API keys, tokens, and other credentials sometimes end up pasted into Claude Code sessions. v0.2.2a1 ships no redaction — review your
~/.cache/supamemQdrant collection before sharing it. Hand-exclude sensitive sessions viaexclude_paths_glob. Redaction is tracked for v0.3 via a futuresupamem.redactorplugin group.
Currently supported transcript formats: Claude Code JSONL (Cursor SQLite and ChatGPT export are deferred to follow-on plugins).
Filter retrieval by coding-shaped category via the where parameter on
dual_memory_search (and the qdrant_find alias):
# Only chunks classified as backend code
dual_memory_search(query="auth flow", where={"room": "backend"})
# OR across rooms (Qdrant MatchAny)
dual_memory_search(query="rate limit", where={"room": ["backend", "tests"]})Every indexed chunk carries payload.room — one of backend, frontend,
tests, docs, scripts, config, migrations, types, or null.
Classification is exact path-component equality (split on /) — a file
at data/chest_xray/img.png is NEVER classified as tests. Multiple keys
in where are AND; list values within a key are OR.
Override the default keyword map in .supamem/config.toml:
[supamem.classifier.rooms]
tests = ["tests", "test", "__tests__"]
backend = ["src", "backend", "api"]
frontend = ["frontend", "web", "client", "components"]
# Priority is encoded by key order — first match wins.
# Putting `tests` before `backend` makes tests/backend/api_test.py classify as `tests`.supamem doctor surfaces the active rooms map with [source: ...] provenance,
the stored classifier_hash, and a per-room histogram (including a null bucket).
Changing [supamem.classifier.rooms] triggers a one-time re-classify sweep on
the next supamem index — Qdrant set_payload per-room, zero re-embedding cost.
Pre-v0.2.3 collections auto-migrate on first post-upgrade index invocation.
Transcript chunks (chunker == transcript) classify to room = null by construction —
filter them via the existing payload.chunker key.
Claude Code
supamem install --client claude-code # default: --scope project (per-workspace .mcp.json)
supamem install --client claude-code --scope user # legacy global install in ~/.claude.json
supamem install --client claude-code --enforce-search # also register the opt-in edit-gateDefault writes <repo>/.mcp.json (project-scope, committable; takes precedence over user-scope
per Anthropic MCP docs). Always registers the SessionStart banner + injection hook in
~/.claude/settings.json. With --enforce-search, also registers a PreToolUse gate that
DENIES Edit|Write|MultiEdit when no mcp__supamem__dual_memory_search is found in the
session transcript since the last user turn — override per-session with
SUPAMEM_GATE_DISABLE=1. Preview any command with --dry-run.
Cursor
supamem install --client cursor # default: --scope project (<repo>/.cursor/mcp.json)
supamem install --client cursor --scope user # legacy global install in ~/.cursor/mcp.jsonDefault writes <repo>/.cursor/mcp.json (per-workspace; project-level wins on conflict per
Cursor docs). Always writes <repo>/.cursor/rules/dual-memory.mdc and registers a
sessionStart snapshot hook + a beforeSubmitPrompt advisory in <repo>/.cursor/hooks.json.
The advisory injects an agentMessage reminder when the user's prompt looks edit-bound;
suppress with SUPAMEM_ADVISORY_DISABLE=1. (Cursor's hooks API doesn't yet support a
fail-closed pre-edit event — the advisory is the strongest available nudge.)
OpenCode
supamem install --client opencodeUpdates ~/.config/opencode/opencode.json and writes a session-start hook to
~/.config/opencode/hooks/.
🛟 MCP launched from the wrong cwd? Hosts (Cursor, some IDE wrappers) sometimes spawn the MCP subprocess from
$HOMEinstead of the workspace, causing supamem to fall back to the default collection (dev_memory_tuned_hybrid) and return Qdrant 404s. SetSUPAMEM_PROJECT_ROOT=/abs/path/to/workspacein the host's MCP config (e.g.~/.cursor/mcp.jsonenvblock, or~/.claude.jsonundermcpServers.supamem.env). If unset, supamem will walk parents looking for.supamem/config.tomlorpyproject.toml[tool.supamem]— and emit a one-line stderr warning when it can't find either. Verify withsupamem doctorfrom the repo root: the resolved collection should match what your MCP client returns fromdual_memory_search.
┌─────────────────┐ MCP/stdio ┌─────────────────┐ REST ┌─────────────┐
│ Claude / Cursor │ ───────────────► │ supamem MCP │ ─────────► │ Qdrant │
│ / OpenCode │ ◄─────────────── │ server │ ◄───────── │ (vectors) │
└─────────────────┘ └─────────────────┘ └─────────────┘
│ ▲
│ session-start hook │ tuned-hybrid retrieval
▼ │ (BM25 + MiniLM fusion)
┌─────────────────┐ │
│ supamem hook │ ─────────────────────────┘
│ (auto-recall) │
└─────────────────┘
- Indexer chunks Markdown by header (T-1 chunker, 200-token target / 250 soft max)
- Embedders produce sparse (BM25) and dense (MiniLM-L6) vectors
- Retrieval runs both arms in parallel, fuses with reciprocal rank fusion, returns top-k
- MCP server exposes
dual_memory_search(read) anddual_memory_write(write/idempotent agent-memory persistence) — plusqdrant_findandqdrant_storeas drop-in aliases for users coming from upstreammcp-server-qdrant(disable withSUPAMEM_QDRANT_ALIASES=0) - Hooks call
supamem hook <client>at the right moment, so memory loads transparently
We welcome PRs! Quick start:
git clone https://github.com/dzmitrys-dev/supamem.git
cd supamem
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"
pytest
ruff check .Coming from an in-tree dev_memory setup? See MIGRATION.md.
MIT — see LICENSE.
Russian-language AI chat platform · AI-first product engineering
supamem was extracted from SoftChat's production memory stack so every team can run on the same
battle-tested pipeline. If it makes your agents smarter, give us a ⭐ — and check out what we
build with it.
Made with care in Belarus 🇧🇾 · app.softchat.ru · softskillz.ai