Skip to content

Commit fdc536d

Browse files
committed
Add agentmemory LOCOMO benchmark comparison
1 parent 99db927 commit fdc536d

11 files changed

Lines changed: 464725 additions & 14357 deletions

cmd/goncho-bench/locomo_backend_comparison.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -693,7 +693,7 @@ func setupNotesForBackend(name string) []string {
693693
case "goncho-no-rank":
694694
return []string{"Local deterministic no-ranking baseline in cmd/goncho-bench; uses the recency order before current Goncho ranking."}
695695
case "agentmemory":
696-
return []string{"External Python probe: scripts/bench_agentmemory_locomo.py --capability.", "Comparable when AGENTMEMORY_SOURCE_DIR points at PR #583 / commit 9b18a80c9d2839b025279978d3f4b5e1f9bc6e74 with npm dependencies installed.", "Adapter path uses standalone InMemoryKV fallback: memory_save external_id plus metadata.memory_id, then memory_smart_search. This validates stable IDs but is not the full running agentmemory server.", "If AGENTMEMORY_SOURCE_DIR is absent, agentmemory is marked not comparable."}
696+
return []string{"External Python probe: scripts/bench_agentmemory_locomo.py --capability.", "Comparable when AGENTMEMORY_SOURCE_DIR points at https://github.com/rohitg00/agentmemory PR #583 / commit 9b18a80c9d2839b025279978d3f4b5e1f9bc6e74 with npm dependencies installed.", "Adapter path uses standalone InMemoryKV fallback: memory_save external_id plus metadata.memory_id, then memory_smart_search. This validates stable IDs but is not the full running agentmemory server.", "If AGENTMEMORY_SOURCE_DIR is absent, agentmemory is marked not comparable."}
697697
case "mem0":
698698
return []string{"External Python probe: scripts/bench_mem0_locomo.py --capability.", "Exact package version used in this run: none; backend marked not comparable before scoring.", "Install candidate: pip install mem0ai, with local/vector dependencies configured by upstream mem0 docs.", "Current status: not comparable in this harness until search results return caller-supplied memory_id unchanged without LLM answer scoring."}
699699
default:
@@ -878,7 +878,7 @@ func writeLocomoBackendComparisonMarkdown(path string, report locomoBackendCompa
878878
}
879879
b.WriteString("## Setup notes\n\n")
880880
b.WriteString("- Goncho, Goncho no-rank, BM25, and SQLite FTS5 are local Go adapters with no hosted dependency.\n")
881-
b.WriteString("- agentmemory probe: `python3 scripts/bench_agentmemory_locomo.py --capability`. Comparable when `AGENTMEMORY_SOURCE_DIR` points at PR #583 / commit `9b18a80c9d2839b025279978d3f4b5e1f9bc6e74` with npm dependencies installed. This adapter uses the standalone InMemoryKV fallback, not the full running agentmemory server.\n")
881+
b.WriteString("- agentmemory probe: `python3 scripts/bench_agentmemory_locomo.py --capability`. Comparable when `AGENTMEMORY_SOURCE_DIR` points at `https://github.com/rohitg00/agentmemory` PR #583 / commit `9b18a80c9d2839b025279978d3f4b5e1f9bc6e74` with npm dependencies installed. This adapter uses the standalone InMemoryKV fallback, not the full running agentmemory server.\n")
882882
b.WriteString("- mem0 probe: `python3 scripts/bench_mem0_locomo.py --capability`. Exact package version used here: none; backend is marked not comparable before scoring. Candidate install: `pip install mem0ai` plus upstream local vector-store dependencies. Comparable only after configured local retrieval can return caller-supplied `memory_id` without answer-generation scoring.\n")
883883
b.WriteString("\n## Interpretation\n\nBackends marked not comparable are excluded from score claims until they implement the `MemoryBackend` contract and return the same stable `memory_id` values that were inserted. This keeps the arena fair and prevents answer-generation or LLM-judge effects from leaking into retrieval metrics.\n")
884884
return os.WriteFile(path, []byte(b.String()), 0o644)

docs-site/src/content/docs/operators/runbook.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -201,8 +201,9 @@ Operator rules:
201201
Backend probe commands:
202202

203203
```sh
204-
AGENTMEMORY_SOURCE_DIR=/path/to/agentmemory-pr583 python3 scripts/bench_agentmemory_locomo.py --capability
205-
AGENTMEMORY_SOURCE_DIR=/path/to/agentmemory-pr583 python3 scripts/bench_agentmemory_locomo.py --smoke
204+
# source: https://github.com/rohitg00/agentmemory at PR #583 commit 9b18a80c9d2839b025279978d3f4b5e1f9bc6e74
205+
AGENTMEMORY_SOURCE_DIR=/path/to/agentmemory python3 scripts/bench_agentmemory_locomo.py --capability
206+
AGENTMEMORY_SOURCE_DIR=/path/to/agentmemory python3 scripts/bench_agentmemory_locomo.py --smoke
206207
python3 scripts/bench_mem0_locomo.py --capability
207208
python3 scripts/bench_mem0_locomo.py --smoke
208209
```
@@ -214,7 +215,7 @@ Expected current status:
214215
| Goncho | comparable | Local adapter returns stable IDs. |
215216
| BM25 | comparable | Local lexical baseline returns stable IDs. |
216217
| SQLite FTS5 | comparable | Local FTS baseline returns stable IDs. |
217-
| agentmemory | comparable with PR source | Set `AGENTMEMORY_SOURCE_DIR` to PR #583 commit `9b18a80c9d2839b025279978d3f4b5e1f9bc6e74`. Stable IDs work; standalone fallback LOCOMO score is `0.0000`, and this is not the full running server. |
218+
| agentmemory | comparable with PR source | Set `AGENTMEMORY_SOURCE_DIR` to `https://github.com/rohitg00/agentmemory` at PR #583 commit `9b18a80c9d2839b025279978d3f4b5e1f9bc6e74`. Stable IDs work; standalone fallback LOCOMO score is `0.0000`, and this is not the full running server. |
218219
| mem0 | not comparable | Package is not installed locally; no stable-ID run exists. |
219220

220221
Primary outputs:

docs/benchmarks/external-backend-adapters.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,15 +70,15 @@ LOCOMO contains duplicate and near-duplicate content, including repeated content
7070
| Goncho no-rank | local Go harness | yes | Native LOCOMO `memory_id` | Local deterministic no-ranking baseline that uses recency order before current Goncho ranking. |
7171
| BM25 | local Go harness | yes | Native LOCOMO `memory_id` | Local deterministic lexical baseline. |
7272
| SQLite FTS5 | local Go SQLite FTS5 | yes | Native LOCOMO `memory_id` column | Local deterministic lexical baseline. |
73-
| agentmemory | `@agentmemory/agentmemory 0.9.20`, PR #583 commit `9b18a80c9d2839b025279978d3f4b5e1f9bc6e74` | yes, standalone fallback | `memory_save.external_id` plus `metadata.memory_id` returned by `memory_smart_search` | Stable IDs work. LOCOMO full score is `0.0` for the standalone InMemoryKV fallback because it uses strict all-term substring matching; this is not the full running agentmemory server. |
73+
| agentmemory | `https://github.com/rohitg00/agentmemory`, `@agentmemory/agentmemory 0.9.20`, PR #583 commit `9b18a80c9d2839b025279978d3f4b5e1f9bc6e74` | yes, standalone fallback | `memory_save.external_id` plus `metadata.memory_id` returned by `memory_smart_search` | Stable IDs work. LOCOMO full score is `0.0` for the standalone InMemoryKV fallback because it uses strict all-term substring matching; this is not the full running agentmemory server. |
7474
| mem0 | Python `3.12.3`; package not installed locally | no | Not executed | `mem0`/`mem0ai` is not installed in this environment; no stable-ID run can be produced. |
7575

7676
## Setup commands
7777

7878
agentmemory candidate setup:
7979

8080
```bash
81-
git clone --branch feature/stable-external-memory-ids https://github.com/XelHaku/agentmemory.git
81+
git clone https://github.com/rohitg00/agentmemory.git
8282
cd agentmemory
8383
git checkout 9b18a80c9d2839b025279978d3f4b5e1f9bc6e74
8484
npm install --legacy-peer-deps

0 commit comments

Comments
 (0)