Skip to content

Commit 096cd41

Browse files
authored
docs: add Comparison with Alternatives and Embedding Model Evaluation pages (#338)
- Move the competitor comparison out of design-philosophy into its own page under Home, and extend it to cover mem0, MemPalace, and Letta (MemGPT) alongside the existing claude-mem / qmd entries. Reorganized the matrix around integration surface, write semantics, and search rather than the old feature-dump table. - Add a new Embedding Model Evaluation page under Home that mirrors evaluation/README.md: benchmark dataset, 13 models tested, Recall@5 and MRR results, ONNX vs PyTorch comparison, and the reasoning behind making ONNX bge-m3 int8 the Claude Code plugin default. - Bump uv.lock version marker to match pyproject (0.2.4). Signed-off-by: Cheney Zhang <chen.zhang@zilliz.com>
1 parent 2dec87d commit 096cd41

5 files changed

Lines changed: 282 additions & 26 deletions

File tree

docs/design-philosophy.md

Lines changed: 3 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -157,28 +157,6 @@ If you are already using OpenClaw's memory directory layout, memsearch works wit
157157

158158
---
159159

160-
## Comparison with Competitors
161-
162-
| Feature | **memsearch** | claude-mem | qmd |
163-
|---------|:---:|:---:|:---:|
164-
| **Cross-platform** | 4 platforms | Claude Code only | Claude Code + MCP |
165-
| **Source of truth** | Markdown files | SQLite + ChromaDB | Markdown files |
166-
| **Search** | Hybrid (dense + BM25 + RRF) | Dense only + FTS5 | Hybrid (dense + BM25 + RRF) + query expansion |
167-
| **Embedding** | Pluggable (6 providers) | Fixed (MiniLM WASM) | Local GGUF (embeddinggemma / Qwen3) |
168-
| **Reranking** | Optional cross-encoder (ONNX) | None | Local LLM (qwen3-reranker) |
169-
| **Progressive disclosure** | L1 → L2 → L3 | Single layer | Single layer |
170-
| **Context isolation** | Skill in forked subagent | MCP tools in main context | MCP tools in main context |
171-
| **Storage format** | `.md` (human-readable, git-friendly) | Binary DB | `.md` (human-readable, git-friendly) |
172-
| **Vector backend** | Milvus (Lite → Server → Cloud) | ChromaDB | SQLite + sqlite-vec |
173-
| **Memory capture** | Automatic (hooks write daily `.md`) | Automatic | External (read-only search engine) |
174-
| **API key required** | No (ONNX default) | No (WASM) | No (all local models) |
175-
| **Language** | Python | TypeScript | TypeScript |
176-
177-
**Key advantages:**
178-
179-
1. **Cross-platform portability.** memsearch works across Claude Code, OpenClaw, OpenCode, and Codex CLI with shared memory.
180-
2. **Transparent storage.** Markdown files are human-readable and git-friendly. You can inspect, edit, and version-control your agent's memories directly.
181-
3. **End-to-end memory.** memsearch captures session summaries automatically and writes them to markdown -- it is both a search engine and a memory writer. qmd is read-only and requires external tools to capture memories.
182-
4. **Search quality.** Hybrid search (dense + BM25 + RRF) catches both semantic matches and exact keyword matches that pure-dense solutions miss.
183-
5. **Scale path.** Milvus Lite for dev, Milvus Server for teams, Zilliz Cloud for production -- same API throughout.
184-
6. **Context efficiency.** Progressive disclosure and forked subagent recall minimize context window usage.
160+
## Comparison with Alternatives
161+
162+
A side-by-side comparison of memsearch with claude-mem, qmd, [mem0](https://github.com/mem0ai/mem0), [MemPalace](https://github.com/milla-jovovich/mempalace), and [Letta (MemGPT)](https://github.com/letta-ai/letta) lives on its own page: **[Comparison with Alternatives](home/comparison.md)**.

docs/home/comparison.md

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# Comparison with Alternatives
2+
3+
This page compares **memsearch** with other open-source memory solutions for LLMs and AI agents. Each project solves a real problem, and the right choice depends on *what kind of agent you are building* and *how you want memory to be stored*.
4+
5+
We group the projects below into two categories:
6+
7+
- **Coding-CLI memory plugins** — attach to an existing agent CLI (Claude Code, Codex, OpenCode, OpenClaw, Cursor, …) and give it persistent memory: memsearch, [claude-mem](https://github.com/thedotmack/claude-mem), [qmd](https://github.com/nomenclator-ninja/qmd), [MemPalace](https://github.com/milla-jovovich/mempalace).
8+
- **General-purpose agent memory systems** — memory libraries or agent runtimes you build applications on top of: [mem0](https://github.com/mem0ai/mem0), [Letta / MemGPT](https://github.com/letta-ai/letta).
9+
10+
---
11+
12+
## TL;DR
13+
14+
| | memsearch | claude-mem | qmd | MemPalace | mem0 | Letta (MemGPT) |
15+
|---|:---:|:---:|:---:|:---:|:---:|:---:|
16+
| **Category** | Coding-CLI plugin | Coding-CLI plugin | Coding-CLI plugin | Coding-CLI plugin | General memory library | Agent framework / runtime |
17+
| **Integration** | Native hooks + skills (4 CLIs) | Native hooks (Claude Code) | MCP | MCP | Python / JS SDK, REST, MCP | Rewrite agent on Letta runtime |
18+
| **Source of truth** | Plain `.md` files | SQLite + ChromaDB | `.md` files | ChromaDB | Vector DB (+ optional graph DB) | Postgres (memory blocks + archival) |
19+
| **Write strategy** | Append-only daily logs | LLM-summarized transcripts | External (read-only search engine) | Store-everything raw | LLM extracts facts, LLM decides add/update/delete | LLM self-edits memory (`memory_replace`, `memory_insert`, …) |
20+
| **Search** | Hybrid: dense + BM25 + RRF | Dense + FTS5 | Hybrid: dense + BM25 + RRF + query expansion | Dense (ChromaDB) | Dense (+ optional graph traversal) | Dense archival + conversation search |
21+
| **Local-first default** | ONNX bge-m3 (no API key) | WASM MiniLM | Local GGUF | Local ChromaDB + Llama | Requires LLM API for every write | Depends on configured backend |
22+
| **Scale path** | Milvus Lite → Server → Zilliz Cloud (same API) | Single machine | Single machine | Single machine | Depends on chosen vector DB | Postgres / pgvector |
23+
24+
> None of the benchmark numbers published by individual projects (LOCOMO, LongMemEval, etc.) are directly comparable because the evaluation setups differ. We do not claim a benchmark win here — the comparison is about *architectural shape*, not accuracy numbers.
25+
26+
---
27+
28+
## Quick orientation
29+
30+
### memsearch
31+
32+
A cross-platform semantic memory plugin for coding-CLI agents. Ships **native plugins** for Claude Code, OpenClaw, OpenCode, and Codex CLI (not MCP adapters — actual per-platform hooks and skills). Stores memory as plain markdown daily logs; Milvus is a derived hybrid-search index rebuildable from the markdown at any time.
33+
34+
### claude-mem
35+
36+
Memory for Claude Code only. Hooks compress session transcripts using an LLM and store the result in ChromaDB + SQLite. Storage is opaque (binary DB), Claude Code–specific.
37+
38+
### qmd
39+
40+
Local-first MCP search engine for markdown notes. Read-only — it searches existing markdown; capture is left to the user or external tools. Share the same markdown-as-source-of-truth philosophy as memsearch.
41+
42+
### MemPalace
43+
44+
A memory server organized around the *method of loci* ("wings → halls → rooms"). Stores conversations raw in ChromaDB without LLM extraction, then exposes them to chat clients (Claude Code, ChatGPT, Cursor) via MCP. Runs fully offline with local Llama + ChromaDB.
45+
46+
### mem0
47+
48+
A general-purpose memory layer for LLM applications (not tied to any specific coding CLI). Every write goes through an LLM that extracts entities and relationships, decides whether to add / update / delete existing memories, and stores the results in a configurable vector DB — optionally mirrored to a graph DB (Neo4j, Memgraph, Neptune, Kuzu, AGE). Published as a Python/JS SDK, REST API, hosted platform, and (via OpenMemory) an MCP server.
49+
50+
### Letta (formerly MemGPT)
51+
52+
A full **agent framework and server** built around the "LLM as an operating system" idea. Memory is hierarchical — a small in-context *core memory*, plus *archival memory* and *recall memory* stored in Postgres — and the agent itself edits its own memory at runtime through dedicated tools (`memory_replace`, `memory_insert`, `archival_memory_insert`, `conversation_search`, …). Letta is not a plugin you bolt onto an existing CLI; you build your agent on the Letta runtime.
53+
54+
---
55+
56+
## Detailed feature matrix
57+
58+
### Integration surface
59+
60+
| | memsearch | mem0 | MemPalace | Letta |
61+
|---|:---:|:---:|:---:|:---:|
62+
| Claude Code native plugin | ✅ (hooks + skills) | ❌ (MCP only) | ❌ (MCP only) | ❌ (runtime, not plugin) |
63+
| OpenClaw native plugin |||||
64+
| OpenCode native plugin |||||
65+
| Codex CLI native plugin |||||
66+
| Generic MCP | Not shipped | ✅ (OpenMemory) |||
67+
| Library / SDK | Python | Python, JS | Python | Python |
68+
69+
"Native plugin" means memsearch participates in the CLI's own lifecycle events (SessionStart, UserPromptSubmit, Stop, SessionEnd, …) with collection naming, per-project isolation, and skill registration. Generic MCP integrations only expose tools to the LLM — they cannot write daily memory notes at the end of a session, or inject cold-start context at session start.
70+
71+
### Memory write semantics
72+
73+
| | memsearch | mem0 | MemPalace | Letta |
74+
|---|---|---|---|---|
75+
| Who decides what to store? | Session-end hook summarizes the last turn as third-person notes | An LLM extracts "salient facts" on every write | Nobody — raw transcript is stored as-is | The agent itself, via tool calls during the reasoning loop |
76+
| Updates to prior memories? | Append-only (never mutates history) | LLM may update or delete prior memories during the update phase | Append-only | Agent can rewrite core memory blocks at any time |
77+
| LLM cost per write | One small Haiku call per turn (async, non-blocking) | LLM extraction call(s) per write | None (no LLM on the write path) | Depends on the agent loop — each self-edit is an LLM tool call |
78+
| Auditability | `git log` on `memory/YYYY-MM-DD.md` | Inspect rows in the vector/graph DB | Inspect ChromaDB | Inspect Postgres tables |
79+
80+
**Append-only vs. self-editing** is the key philosophical split. memsearch treats memory like a commit log: once written, always auditable. mem0 and Letta treat memory like a mutable KV store that the LLM maintains — which can converge on cleaner facts, but also means prior writes can be silently rewritten or deleted by a later LLM call.
81+
82+
### Search & retrieval
83+
84+
| | memsearch | mem0 | MemPalace | Letta |
85+
|---|---|---|---|---|
86+
| Dense vectors |||| ✅ (archival memory) |
87+
| BM25 / sparse | ✅ (RRF fused with dense) | ❌ by default |||
88+
| Reranking | Optional cross-encoder (ONNX) ||||
89+
| Graph traversal || ✅ (optional graph backend) |||
90+
| Progressive disclosure | L1 search → L2 expand section → L3 drill into original transcript JSONL | Single top-K retrieval | Four-layer context loading (L0–L3) | Core memory always in context; archival pulled on-demand |
91+
92+
---
93+
94+
## Where memsearch is actually different
95+
96+
We try to keep this list honest — only things that are real consequences of the current architecture, not marketing claims.
97+
98+
### 1. Native plugins for four coding CLIs, not just an MCP adapter
99+
100+
memsearch ships first-class plugins for Claude Code, OpenClaw, OpenCode, and Codex CLI. Each plugin hooks into that CLI's lifecycle (session start / prompt submit / stop / session end) to capture memory automatically and inject cold-start context. None of mem0, MemPalace, or Letta ship native integrations for these coding CLIs — they expose memory tools over MCP or a REST API, which is a thinner integration.
101+
102+
### 2. Plain markdown is the canonical store; the vector DB is derived
103+
104+
Your memory lives in `memory/YYYY-MM-DD.md` and `MEMORY.md`. You can `cat`, `grep`, `git diff`, and `git blame` it. If you lose the Milvus index, you rebuild it from the markdown. mem0 and Letta both store memory inside a database (vector DB / Postgres) — their storage is opaque by design. MemPalace stores in ChromaDB only.
105+
106+
### 3. Writes are cheap and append-only
107+
108+
A memsearch write is: extract the last turn → one Haiku summarization call → append a bullet to today's `.md`. No LLM "decides what to forget." No self-editing. No entity extraction pipeline. This makes writes cheap, predictable, and fully auditable — at the cost of not auto-compressing redundant memories (you can run `memsearch compact` on demand if you want that).
109+
110+
mem0 and Letta are on the other end of the spectrum: they rely on LLMs to curate memory on the write path, which is more powerful but introduces cost, latency, and the possibility of silent data loss.
111+
112+
### 4. Hybrid search with BM25 fused via RRF, out of the box
113+
114+
memsearch indexes every chunk with both a dense vector and a BM25 sparse vector, and fuses them at query time with Reciprocal Rank Fusion. Exact keyword hits (function names, file paths, error strings) and semantic matches both surface. mem0, MemPalace, and Letta archival are dense-only by default.
115+
116+
### 5. A clear scale path on one API
117+
118+
Milvus Lite (a single local file, zero deps) → Milvus Server (self-hosted Docker/K8s) → Zilliz Cloud (fully managed). Same Python API, same collection format, you just change a URI. MemPalace is ChromaDB-only; claude-mem is ChromaDB + SQLite; Letta is Postgres + pgvector; mem0 is pluggable but you wire the backend yourself.
119+
120+
### 6. Context isolation via forked subagents
121+
122+
On Claude Code, memory recall runs inside a skill with `context: fork` — the subagent does search, expansion, and transcript drill-down in its own context window, and only returns a curated summary to the main conversation. Retrieval never pollutes the main context with raw search hits.
123+
124+
---
125+
126+
## When another project is the better fit
127+
128+
A few cases where memsearch is *not* what you want:
129+
130+
- **You are building a general-purpose LLM application, not wiring memory into a coding CLI.** mem0 is designed for this — its SDK, hosted service, and graph-memory features assume you control the whole application.
131+
- **You want the LLM to actively maintain memory (summarize, deduplicate, forget).** Letta's self-editing memory and mem0's LLM-driven extraction/update do this by design. memsearch deliberately does not.
132+
- **You want a pre-built stateful-agent runtime (personas, tool loops, long-running agents).** Letta is a full agent framework; memsearch is only the memory layer.
133+
- **You only use Cursor or ChatGPT Desktop via MCP and don't need per-CLI hooks.** MemPalace's MCP-first model fits cleanly there, and its "store everything raw" philosophy is close to memsearch's append-only writes.
134+
135+
---
136+
137+
## References
138+
139+
- mem0 — [github.com/mem0ai/mem0](https://github.com/mem0ai/mem0), [docs.mem0.ai/graph-memory](https://docs.mem0.ai/open-source/features/graph-memory), paper: [arxiv.org/abs/2504.19413](https://arxiv.org/html/2504.19413v1)
140+
- Letta (MemGPT) — [github.com/letta-ai/letta](https://github.com/letta-ai/letta), [docs.letta.com/concepts/memgpt](https://docs.letta.com/concepts/memgpt/)
141+
- MemPalace — [github.com/milla-jovovich/mempalace](https://github.com/milla-jovovich/mempalace)
142+
- claude-mem — [github.com/thedotmack/claude-mem](https://github.com/thedotmack/claude-mem)
143+
- qmd — [github.com/nomenclator-ninja/qmd](https://github.com/nomenclator-ninja/qmd)

0 commit comments

Comments
 (0)