Skip to content

v1.6.0 — Hybrid Coding Arena (rebrand)

Latest

Choose a tag to compare

@sanchitmonga22 sanchitmonga22 released this 17 Jun 02:39
· 3 commits to main since this release
d2b514d

Hybrid Coding Arena is the new name for this benchmark (formerly hybrid-coding-eval), from RunAnywhere.

What changed in v1.6.0

This is a rebrand release. The benchmark, methodology, and dataset are unchanged.

  • Python package: hybrid_coding_eval is now hybrid_arena.
  • Distribution + repo: hybrid-arena (the old hybrid-coding-eval URL redirects here).
  • CLI command: bench is now arena (e.g. arena sweep, arena analyze).
  • Clearer headline chart: pass-rate and cloud usage are now separate, labeled elements.
git clone https://github.com/RunanywhereAI/hybrid-arena
cd hybrid-arena && python3.12 -m venv .venv
.venv/bin/pip install -e ".[dev,agents]"
arena setup
arena sweep --config configs/v1.4-smoke.yaml --strategies always-cloud --seeds 42
arena analyze results/runs/v1.4-smoke

Dataset

results-v1.6.0.tar.gz is byte-identical to the v1.5.0/v1.5.1 dataset (1,704 rows). No new benchmark runs in this release.

Headline

  • cline + qwen3.6 + cascade on real-developer refactors: 24/24 = 100% at 8% cloud, about $0.022/task.
  • Local-only solves 67% of the hard (D6) tasks at $0 cloud; cloud-only holds 100%.
  • 1,704 rows, 3 local models, 3 coding agents, 8 routing strategies, 17 tasks, one M4 Max laptop, 95% bootstrap CIs.