v1.4.3 — back-compat-free cleanup

Release date: 2026-05-26 Theme: drop every back-compat surface that was carrying old v1.0–v1.3 names into v1.4, document the fresh-user setup in one place (the prerequisites table in README.md), and make scripts/reproduce.sh print the right brew / apt install hint when something is missing.

This is not a new dataset release — no inference was re-run. The canonical 1,644-row v1.4.1 leaderboard is unchanged.

What changed for users

1. Task-class names are now consistent end-to-end

The single-letter category codes (A, B, C, D, X) that v1.0–v1.3 carried in ResultRow.category and in bootstrap_cis.json cell keys are gone. Every surface uses the human-readable names now:

What	Before (≤ v1.4.2)	After (v1.4.3)
`ResultRow.category`	`"A"` / `"D"` / `"B"`	`"puzzles"` / `"refactors"` / `"real-prs"`
`bootstrap_cis.json` cell key	`"A::aider::heuristic"`	`"puzzles::aider::heuristic"`
`aggregate.json` cell key	`"D/cline"`	`"refactors/cline"`
`decision_matrix.md` rows	`D`	`refactors`

If you scripted against the old keys, the change is mechanical: A → puzzles, D → refactors, B → real-prs. The pre-v1.4.3 datasets in results/runs/{01..04, 07, 11}/ keep their original keys (they're immutable); v1.4.0 / v1.4.1 GitHub-release tarballs were generated with the legacy keys and can be re-rendered with the v1.4.3 harness if you want consistent labels.

2. Prerequisites section in `README.md`

The single most common "I cloned this and it doesn't work" failure was forgetting to install Ollama or opencode before running the sweep. The README.md now leads with a tidy prerequisites table listing every external tool the harness drives, with the exact brew install … / sudo apt install … command on both platforms.

scripts/reproduce.sh echoes the matching install command when it detects a missing prereq, so a first-time user gets:

[reproduce] missing prerequisite: 'ollama'
  → install with: curl -fsSL https://ollama.com/install.sh | sh

…instead of a generic "command not found" failure.

3. Removed: legacy `R3` architect pipeline

router/pipelines/architect/ and router/agentic/architect.mjs were the v3 multi-step "plan → execute → synthesise" pipeline that the v1.4 four-agent matrix replaced. No agent in v1.4 calls into it; the special model: "router/architect" pseudo-strategy in server.mjs was unreachable. Both directories + the dispatcher + the import are gone. ~200 lines of dead code, plus 9 vendored example outputs.

4. `pair_already_done` is strict now

The orchestrator's resume-skip check used to wildcard-match rows with router_strategy=None against any in-progress strategy (a back-compat hack for v0 rows that predated the strategy axis). v1.4.3 requires an exact (task, route, strategy) triple match. The smoke / canonical configs all carry an explicit strategy, so this is invisible at the user surface — but it stops a foot-gun where a --router-strategy heuristic resume could silently skip a stale null-strategy row.

5. Cleaned up code comments + docstrings

Every "R6 / R7 / R8 / R10" reference in docstrings, comments, and test names was replaced with the agent name. The agent modules read as standalone documents now — no implicit knowledge of the v1.0 route numbering required.

Two follow-up paper cuts (per-agent scratch dir names still using the r6_ / r7_ / r8_ / r10_ prefix, and matplotlib / numpy not being declared as runtime deps so arena analyze failed on a fresh pip install -e ".[dev]") were shipped in v1.4.4.

What did NOT change

No data was re-run. The v1.4.1 leaderboard (1,644 rows) stands.
No public API change. The arena CLI surface is identical; every flag in v1.4.2 still works in v1.4.3.
Pricing tables unchanged. Same pricing_tables.json SHA256 as v1.4.2.
Tests + ruff still green. pytest -m 'not slow' → 109 passed.

Fresh-user verification

The README prerequisites table was verified against a clean install flow on macOS 25.5.0 (Darwin):

git clone https://github.com/RunanywhereAI/hybrid-arena
Follow the prereqs table → install Python 3.12, Docker Desktop, Node 24, Ollama, jq via Homebrew
cp .env.example .env && $EDITOR .env (add OPEN_AI_API_KEY)
ollama serve & (if Ollama.app not already running)
ollama pull gemma4:31b
./scripts/reproduce.sh --smoke

The reproducer correctly detected and printed install hints for docker, node, ollama, and jq when they were absent from PATH. ✓ The end-to-end smoke run on this version surfaced the matplotlib + per-agent-dir issues fixed in v1.4.4.

Upgrade notes for v1.4.2 users

git pull
.venv/bin/pip install --upgrade -e ".[dev]"
.venv/bin/pytest tests/ -q -m 'not slow'  # should pass: 109/109

That's it — there is no migration step. The harness now writes the new category names; the legacy datasets in results/runs/ stay at their original keys and are not touched.

Statistics

Files changed: 45 (mostly comment/docstring rewrites + tests)
Lines added: ~280
Lines removed: ~520 (dead architect pipeline + back-compat fallbacks + R-number references)
Net delta: −240 lines, despite adding the README prerequisites table and richer reproducer install hints.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.4.3 — back-compat-free cleanup

What changed for users

1. Task-class names are now consistent end-to-end

2. Prerequisites section in `README.md`

3. Removed: legacy `R3` architect pipeline

4. `pair_already_done` is strict now

5. Cleaned up code comments + docstrings

What did NOT change

Fresh-user verification

Upgrade notes for v1.4.2 users

Statistics

Uh oh!

FilesExpand file tree

v1.4.3.md

Latest commit

History

v1.4.3.md

File metadata and controls

v1.4.3 — back-compat-free cleanup

What changed for users

1. Task-class names are now consistent end-to-end

2. Prerequisites section in README.md

3. Removed: legacy R3 architect pipeline

4. pair_already_done is strict now

5. Cleaned up code comments + docstrings

What did NOT change

Fresh-user verification

Upgrade notes for v1.4.2 users

Statistics

2. Prerequisites section in `README.md`

3. Removed: legacy `R3` architect pipeline

4. `pair_already_done` is strict now