Skip to content

Commit 4aac7d6

Browse files
chore: update
1 parent 16a76c7 commit 4aac7d6

1 file changed

Lines changed: 5 additions & 5 deletions

File tree

benchmarks/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,8 @@ benchmarks/data/splits/ ← 5 stratified CV splits (seeds 42, 0,
5252

5353
| Subset | Questions | Purpose |
5454
|---|---|---|
55-
| `dev` | 50 | Hyperparameter tuning only (~7 min) |
56-
| `held_out` | 450 | Single clean evaluation (~60 min) |
55+
| `dev` | 50 | Hyperparameter tuning only |
56+
| `held_out` | 450 | Single clean evaluation |
5757

5858
**Integrity rule:** all alpha decisions are made on `dev` only. The held-out
5959
result is run once; no parameters are adjusted after observing held-out failures.
@@ -241,7 +241,7 @@ regenerate them, as the held-out IDs must be identical across runs.
241241

242242
### Fixed split — canonical held-out result
243243

244-
**Step 1 — Find optimal alphas on dev (~7 min)**
244+
**Step 1 — Find optimal alphas on dev**
245245

246246
```bash
247247
.venv/bin/python -u benchmarks/longmemeval_bench.py \
@@ -255,7 +255,7 @@ Expected best combo: `ECR α=0.3 IDF α=0.6 CAATB α=0.2`
255255
The sweep evaluates 27 alpha combinations without re-embedding — raw vector rows
256256
are cached once per question and all combos are applied offline.
257257

258-
**Step 2 — Evaluate on held-out (~60 min)**
258+
**Step 2 — Evaluate on held-out**
259259

260260
```bash
261261
.venv/bin/python -u benchmarks/longmemeval_bench.py \
@@ -277,7 +277,7 @@ Results are written to `benchmarks/results/results_mw_ecr{α}_idf{α}_caatb{α}_
277277

278278
---
279279

280-
### 5-seed cross-validated results (~5–6 hours)
280+
### 5-seed cross-validated results
281281

282282
```bash
283283
.venv/bin/python -u benchmarks/multiseed_sweep.py \

0 commit comments

Comments
 (0)