mcp-brain

The repo-aware, team-aware, token-efficient memory layer for Claude Code.

Claude Code doesn't fail because it lacks intelligence.
It fails because it has zero awareness of your repo and your team.

🚀 TL;DR

mcp-brain is a Model Context Protocol (MCP) server that gives Claude Code persistent, structured awareness of your project — without burning tokens on context rebuilding.

🧠	Compressed awareness in ~100 tokens instead of ~2000
🎯	63.4% Hit@10 on SWE-bench Full (2294 real GitHub issues) — zero LLM cost
⚡	Sub-100ms file prediction (BM25 + code graph + optional semantic reranker)
👥	Team-aware: soft claims, conflict detection, ownership tracking
🔄	Self-healing: decision lifecycle, automatic staleness, feedback loop
🛡️	Local-first: SQLite, no cloud, no embeddings required, GDPR-friendly

📑 Table of Contents

The Problem
What mcp-brain Changes
In 60 seconds
How It Works
Memory Hierarchy
Prediction Pipeline
Decision Lifecycle
Architecture
Benchmark Results
Token Efficiency
Quick Start
MCP Tools
Use Cases
FAQ
Trade-offs
Roadmap
License

🚨 The Problem

Without persistent awareness, Claude Code operates blindly at the start of every session:

Without mcp-brain	With mcp-brain
❌ No idea which files matter	✅ Predicted files in top-K
❌ Re-explores the repo every session	✅ Compressed context in ~100 tokens
❌ No visibility into teammates' WIP	✅ Soft claims + conflict detection
❌ Acts on outdated decisions	✅ Decision lifecycle (active → stale)
❌ Burns 2000–5000 tokens just to "orient"	✅ One YAML block, ready to act

Result without mcp-brain: wrong file exploration → outdated suggestions → merge conflicts → massive token waste.

⚡ What mcp-brain Changes

┌──────────────────────────────────────────────────────┐
│                                                      │
│   Without:  Claude → explores → guesses → retries    │
│             → conflicts → high token usage           │
│                                                      │
│   With:     Claude → predicts → verifies → acts      │
│             → aligned → low token usage              │
│                                                      │
└──────────────────────────────────────────────────────┘

🧬 Core idea

Instead of giving Claude more context, we give it structured awareness of reality.

We track:

📌 what changed (signal extraction from git)
🎯 what matters (scoring + lifecycle)
👥 who's working on what (team claims)
🧭 where to act (issue → file prediction)

…and we deliver it in ~100 tokens.

⏱️ In 60 seconds

You drop a one-line ticket into Claude Code:

> work on ticket #42 — JWT login broken

Without mcp-brain, Claude starts grep-walking the repo, reading directory listings, opening README, sampling files — burning 2000+ tokens before producing the first useful sentence.

With mcp-brain, in <100ms Claude receives:

predictions:
  - file: src/auth.py
    confidence: high
    why: "path + symbol match: login, jwt"
  - file: src/middleware.py
    confidence: medium
    why: "imports auth (hop 1)"
  - file: src/jwt_utils.py
    confidence: medium
    why: "called_by auth.login"
team_claims:
  - { ticket: 39, author: dev-B, files: [middleware.py] }   # ⚠️ overlap
avoid:
  - "HS256 — vulnerable to key confusion. Migrated to RS256 in commit a1b2c3."
decisions:
  - "tokens stored in httpOnly cookie, never localStorage"

It's structured reality, not regenerated context. Claude can act on the first turn.

🔑 How It Works

flowchart TD
    subgraph Capture[Capture signals]
        A[Git commit] -->|filtered signals| B[mcp-brain memory]
        C[Session end] -->|structured snapshot| B
    end

    subgraph Predict[Predict where to act]
        E[Ticket opened] --> F[File predictor]
        F -->|top-K files + confidence + why| D[Claude Code]
    end

    subgraph Coordinate[Coordinate team work]
        F -->|overlap check| G[Team claims]
        G -->|conflict warnings| D
    end

    subgraph Learn[Learn from outcomes]
        H[Outcome recorded] -->|precision / recall| I[Feedback loop]
        I -->|demote noisy memories| B
        I -->|supersede stale decisions| B
    end

    B -->|~100-token YAML context| D

Capture — git hooks promote only high-signal events (decisions, patterns, things to avoid). Ignored: docs, chore, tests, CI noise.
Compress — three-level memory (L1/L2/L3) auto-assigned by a scoring function (recency 35% + frequency 30% + impact 20% + explicit 15%).
Predict — issue title/body → ranked file list via BM25 + code graph expansion + optional semantic reranker.
Coordinate — soft claims warn before two devs touch the same files.
Self-correct — every closed ticket feeds precision/recall stats; noisy memories are auto-demoted.

🧠 Memory Hierarchy

Memories aren't dumped into one bag. They're scored and tiered, so the high-token slot in your prompt only carries what's signal-dense for this moment:

L1 — hot context loads automatically every session. Stack, conventions, current branch, recent commits, team claims, active high-confidence decisions. Capped at ~70 tokens.
L2 — warm context loads only on demand (brain_get_decisions). Historical reasoning, superseded patterns, the why behind a past trade-off.
L3 — cold archive is never sent to the model. Kept for audit, transparency, and the lifecycle's "undo" path.

The score is a transparent linear formula — no black-box embedding similarity. Every memory's level is reproducible and explainable.

🔍 Prediction Pipeline

The predictor is three deterministic stages:

Stage	What it does	Cost
1. BM25 + IDF	Tokenize issue, match against symbols / identifiers / paths in an inverted index	~5 ms
2. Graph expansion	Walk `imports` / `imported_by` / `called_by` from seeds. Score decays per hop (`×0.5`, `×0.25`)	~10 ms
3. Semantic rerank (optional)	MiniLM (80 MB, CPU/GPU) embeds query + candidates, blends 30% cosine sim with 70% BM25	~50 ms

Every prediction comes back with a why field and a full breakdown, so you can audit why a file was suggested — no opaque ranking.

💡 Default ON. To run lean (CI / containers without PyTorch), set MCP_BRAIN_SEMANTIC=0 and the pipeline degrades gracefully to BM25 + graph.

🔄 Decision Lifecycle

Memories aren't immortal. mcp-brain assumes you'll change your mind and bakes the lifecycle in:

Age-based decay — after SUSPECT_DAYS a memory gets flagged for re-verification. After STALE_DAYS it's hidden from prompts.
Semantic supersession — write a new memory similar (cosine ≥ 0.85) to an old one and the old one is auto-marked superseded.
Feedback loop — when a memory is shown 3+ times before a reverted ticket, it gets demoted automatically. Noisy memories die fast.

This is what makes mcp-brain safe to leave running for months without manual cleanup. The L1 stays small and trustworthy; the L3 archives the audit trail.

🏗️ Architecture

flowchart TB
    subgraph Client
        CC[Claude Code]
    end
    subgraph Server[mcp-brain server]
        T[MCP Tools layer<br/>brain_init, brain_get_context,<br/>brain_predict_files, ...]
        R[Retriever<br/>+ Compressor]
        P[File Predictor<br/>BM25 + Graph + Semantic]
        F[Feedback Reconciler]
        O[Observability<br/>p50/p95/p99]
    end
    subgraph Storage[Local storage ~/.mcp-brain/]
        DB[(SQLite<br/>memories, sessions,<br/>projects, feedback)]
        IDX[Inverted Index<br/>BM25]
        G[Code Graph<br/>imports/calls]
        Y[YAML claims]
    end
    CC <-->|MCP/stdio| T
    T --> R
    T --> P
    T --> F
    T --> O
    R --> DB
    P --> IDX
    P --> G
    F --> DB
    O --> DB

Repo layout

mcp-brain/
├── src/
│   ├── brain/         # core logic: retriever, compressor, scorer, predictor
│   │                  # code_graph, file_indexer, semantic_reranker,
│   │                  # staleness, similarity, feedback loop, observability
│   ├── capture/       # git hook signal extraction
│   ├── storage/       # SQLite layer
│   └── tools/         # MCP tool definitions (FastMCP)
├── benchmark/         # SWE-bench Lite/Full, Bench4BL, BugLocator harness
├── tests/             # pytest suite (predictor, feedback, observability, ...)
└── assets/            # SVG diagrams used in this README

📊 Benchmark Results

We benchmark file localization — given a real GitHub issue, can mcp-brain rank the production files the accepted patch actually modified?

Dataset: SWE-bench Full

2294 real Python bug-fix tasks from major OSS projects (astropy, django, flask, matplotlib, pandas, pytest, requests, scikit-learn, sphinx, sympy, xarray)
Ground truth = files modified in the accepted reference patch (test files excluded by default — strict production-file evaluation)

Results — `mcp-brain` v1.4.0 (BM25 + graph + semantic)

Metric	@1	@3	@5	@10
Hit	24.5%	43.4%	53.7%	63.4%
Recall	20.1%	36.6%	46.1%	55.8%
MAP	24.5%	28.4%	30.4%	31.8%

Instances evaluated: 2294
Errors: 5 (0.2% failure rate)
Avg gold files per issue: 1.66
Avg predicted files: 9.98 (top-10)

Honest comparison vs. literature

System	Hit@10 (file loc.)	Cost per query	Notes
BM25 baseline (vanilla)	~45–55%	free	symbol search only
mcp-brain v1.4.0	63.4%	free	BM25 + graph + semantic, zero LLM
Agentless / SWE-agent	~70–85%	$0.10–$2	LLM-based, multi-step

Reading the numbers:

Hit@5 = 53.7% → in more than half of real issues, the right production file is in top-5 before Claude reads a single byte.
Hit@10 = 63.4% → expanded to top-10, almost 2 issues out of 3 have the right file ranked.
MAP@1 = 24.5% → the very first prediction is dead-on for 1 issue out of 4.
0.2% error rate over 2294 runs → robust pipeline.

Reproduce it yourself

# One-time online setup
pip install -e .
pip install -r benchmark/requirements-benchmark.txt
python -m benchmark.adapters.swebench --dataset-name princeton-nlp/SWE-bench \
  --output benchmark/datasets/cache/swebench_full.jsonl
python -m benchmark.prepare_repos \
  --dataset benchmark/datasets/cache/swebench_full.jsonl \
  --repo-cache benchmark/repos

# Offline evaluation (full)
python -m benchmark.run_eval \
  --dataset benchmark/datasets/cache/swebench_full.jsonl \
  --repo-cache benchmark/repos \
  --out benchmark/results/swebench_full.json \
  --report-dir benchmark/reports \
  --top-k 10 --max-hops 2 --use-semantic

Reports are emitted as Markdown + HTML in benchmark/reports/.

The harness also supports SWE-bench Lite (300 instances), SWE-bench Verified, Bench4BL, and BugLocator — see benchmark/README.md.

💰 Token Efficiency

The math

A typical Claude Code session without mcp-brain spends thousands of tokens just to orient itself:

Phase (no mcp-brain)	Action	~Tokens
Session start	List directory, read README, sample files	800–2000
Issue handling	Grep symbols, follow imports, retry wrong files	1000–3000
Context restore	Re-explain project conventions	200–500
Total per session		2000–5500

A session with mcp-brain:

Phase (with mcp-brain)	Action	~Tokens
Session start	`brain_get_context` returns compressed L1 YAML	~100
Issue handling	`brain_predict_files` returns ranked top-K + why	~250
Decision recall	`brain_get_decisions` (only when needed)	~300
Total per session		~650

Estimated saving

                        Without          With mcp-brain     Saving
  Session start:    2000 ─────────►       100 tokens        ~95%
  Per session:      2000–5500 ──►       450–950 tokens      40–80%
  Per developer*:   ~1.2M/month ──►    ~400k/month          ~65%

_{*assuming 100 sessions/month/dev}

Why this works

✅ No embeddings required for retrieval (BM25 + code graph)
✅ No vector DB to query (zero round-trip cost)
✅ No history replay — context is reconstructed, not re-scrolled
✅ YAML compression with default_flow_style=True and empty-key stripping
✅ L1/L2 split — heavy memory only loaded on demand

💡 The semantic reranker (use_semantic=True) is on by default and runs locally on CPU/GPU. It does not add LLM cost. Disable with MCP_BRAIN_SEMANTIC=0 for lean CI.

🚀 Quick Start

Install — one command, batteries included

git clone https://github.com/PierfrancescoLijoi/mcp-brain.git
cd mcp-brain
pip install -e ".[all]"

The [all] extra installs:

language parsers (Python, JS, TS, Go, Rust, Java, C#) for the code graph
semantic reranker (sentence-transformers + numpy)
dev tooling (pytest, pytest-cov)

Lean install paths

If you want a smaller footprint, you can pick exactly what you need:

pip install -e .                      # core only — BM25 + graph (no semantic, no parsers)
pip install -e ".[parsers]"           # + multi-language parsers
pip install -e ".[semantic]"          # + semantic reranker (~700 MB w/ PyTorch)
pip install -e ".[dev]"               # + dev tooling

Register with Claude Code

claude mcp add mcp-brain python /absolute/path/to/run.py

On Windows PowerShell:

claude mcp add mcp-brain python "C:\path\to\mcp-brain\run.py"

Initialize your project

mcp-brain init

That's it. Open Claude Code in your repo and the L1 context is automatically available via brain_get_context.

🧠 MCP Tools

Tool	Purpose	When Claude calls it
`brain_init`	Register project, stack, conventions	Once per repo
`brain_get_context`	Load L1 context (~70 tokens)	Every session start
`brain_get_decisions`	Load L2 decisions on demand	When historical context needed
`brain_remember`	Store a memory; level auto-assigned	When user makes a decision
`brain_save_session`	Save end-of-session snapshot	At session end
`brain_predict_files`	Issue → ranked file list with `why`	When opening a ticket
`brain_start_ticket`	Start ticket workflow + conflict check	Workflow orchestration
`brain_record_outcome`	Log ticket outcome (completed/reverted/...)	After ticket closed
`brain_feedback_stats`	Precision/recall window	Health checks
`brain_memory_health`	Surface noisy memories	Debugging
`brain_observability`	Full unified dashboard (YAML)	Ops / CI

Example L1 context output (~100 tokens)

p: {name: my-api, stack: [FastAPI, PostgreSQL]}
s: {branch: feat/auth, wip: "JWT refactor", next: "add refresh token"}

git:
  recent: ["refactor: JWT moved to RS256"]
  changed: [auth.py, middleware.py]

team_claims:
  - {ticket: 42, author: dev-B, files: [middleware.py]}

avoid:
  - "avoid: HS256 — vulnerable to key confusion"

decisions:
  - "decision: tokens stored httpOnly cookie, never localStorage"

👉 Claude already knows where to act before reading a single source file.

💼 Use Cases

🎯 Solo developer

Cuts session-start exploration: −90% tokens on the first turn
Remembers your "I always do it this way" patterns
Auto-supersedes decisions when you change your mind

👥 Small team (3–10 devs)

Conflict detection before two devs touch the same files
Shared decision log with lifecycle (no more "wait, didn't we decide…?")
File ownership inference from git history

🏢 Enterprise (with caveats)

Local-first, no data leaves the machine → GDPR / SOC2-friendly
Compatible with Managed Identity / on-prem deployments (no cloud calls)
Token saving compounds: 65% × 100 devs × 100 sessions/month → measurable infra savings

❓ FAQ

Is this a RAG system or a vector DB?

No, and on purpose. mcp-brain is a structured awareness layer, not a retrieval-over-embeddings layer. The core retrieval is BM25 + code graph expansion — fully deterministic, sub-100ms, no vector DB to maintain. The semantic reranker is an optional 30% blend on top, used only as a tiebreaker. This is why token cost stays predictable and infra is local-first.

Why not just use Claude's native context window? It's huge now.

A long context window doesn't fix the problem — it makes it cheaper to waste. The bottleneck isn't capacity, it's signal density. Pasting your whole repo into the context still leaves Claude searching for the right file linearly. mcp-brain pre-ranks reality so the model spends its attention on the right 3 files, not the wrong 30.

Will it leak my code or memories anywhere?

No. Storage is SQLite under ~/.mcp-brain/ (local) and <repo>/.brain/shared/ (versioned with git if you choose). No outbound network calls, no telemetry, no cloud component. The semantic model runs on your CPU/GPU. This makes mcp-brain compatible with GDPR-restricted and air-gapped environments.

What if I disagree with a decision mcp-brain remembers?

Write a new memory that contradicts it. Semantic supersession (cosine ≥ 0.85) will auto-mark the old one as superseded. You can also manually demote via brain_memory_health or wait for age-based decay (SUSPECT_DAYS / STALE_DAYS). The lifecycle assumes you'll change your mind.

Does it work with languages other than Python?

Yes for indexing/predicting (BM25 is language-agnostic). The code graph currently supports Python, JavaScript, TypeScript, Go, Rust, Java, C# via tree-sitter parsers. Adding a new language is a single registry entry — see src/brain/parsers.py.

How does it compare to SWE-agent / Aider / Cursor?

Different layer of the stack. SWE-agent and similar tools are autonomous coders — they read, plan, and patch via LLM calls. mcp-brain is the awareness layer underneath them. You could pair it with Aider or any MCP-compatible client; it makes whatever LLM you use start from a smarter zero.

What's the catch?

Honest answer: file prediction is heuristic. Hit@1 = 24.5% means 3 issues out of 4 still need Claude to validate the prediction before acting. mcp-brain orients, it doesn't replace exploration. That's also why it's free — it's a force multiplier, not an oracle.

⚠️ Trade-offs

I'm honest about what this is and isn't.

Strength	Limitation
✅ Zero LLM cost for retrieval	⚠️ Heuristic-based: edge cases with no symbol/path overlap can miss
✅ Sub-100ms predictions	⚠️ Requires good commit hygiene (semantic commit messages help)
✅ Local-first, no cloud	⚠️ No cross-machine sync out of the box (use git for `.brain/shared/`)
✅ Deterministic (replays produce same output)	⚠️ Hit@1 = 24.5% → orients, doesn't replace exploration
✅ Works on any size repo	⚠️ Best on medium/large repos (small repos don't benefit much)

This is NOT:

❌ a vector DB memory
❌ a RAG system
❌ an SWE-agent / autonomous coder
❌ a checkpoint / replay tool

This IS:

✅ a repo-aware, team-aware, token-efficient awareness layer
✅ a force multiplier for Claude Code, not a replacement

🛣️ Roadmap

🧪 Run the test suite

pip install -e ".[dev]"
pytest tests/ -v

Expected: full pass on Python 3.10, 3.11, 3.12.

🤝 Contributing

PRs welcome. Before opening one:

pytest tests/ -v must pass
New behavior needs new tests
New MCP tools must be wrapped with @observed("brain_<name>")
Avoid heavy dependencies for the default install path — anything ML-flavored goes behind an optional extra

📄 License

MIT — see LICENSE.

Built for Claude Code — but the architecture is MCP-standard, so any MCP-compatible client works.

_{If mcp-brain saved you tokens, ⭐ the repo. That's the only payment I ask for.}

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.brain/shared		.brain/shared
.claude		.claude
.idea		.idea
assets		assets
benchmark		benchmark
hooks		hooks
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
README.md.bak		README.md.bak
migrate_paths.py		migrate_paths.py
patch_db.py		patch_db.py
pyproject.toml		pyproject.toml
pyproject.toml.bak		pyproject.toml.bak
run.py		run.py

Folders and files

Latest commit

History

Repository files navigation

mcp-brain

🚀 TL;DR

📑 Table of Contents

🚨 The Problem

⚡ What mcp-brain Changes

🧬 Core idea

⏱️ In 60 seconds

🔑 How It Works

🧠 Memory Hierarchy

🔍 Prediction Pipeline

🔄 Decision Lifecycle

🏗️ Architecture

Repo layout

📊 Benchmark Results

Dataset: SWE-bench Full

Results — mcp-brain v1.4.0 (BM25 + graph + semantic)

Honest comparison vs. literature

Reproduce it yourself

💰 Token Efficiency

The math

Estimated saving

Why this works

🚀 Quick Start

Install — one command, batteries included

Lean install paths

Register with Claude Code

Initialize your project

🧠 MCP Tools

Example L1 context output (~100 tokens)

💼 Use Cases

🎯 Solo developer

👥 Small team (3–10 devs)

🏢 Enterprise (with caveats)

❓ FAQ

⚠️ Trade-offs

🛣️ Roadmap

🧪 Run the test suite

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Results — `mcp-brain` v1.4.0 (BM25 + graph + semantic)

Packages