Commit 3d8ce96
committed
feat: register arithmetic, swebench-live, swebench-verified, terminalbench2
Adds entry YAMLs for four cubes that pass quick_check.py with a clean
scaffold. All four live in The-AI-Alliance/cube-harness today and are
installable via the standard dev_install_url pattern; they were
discovered + validated by looping the registry's quick_check.py
(--no-install) against every cube directory in cube-harness/cubes/ as a
registration-UX probe.
Specifics:
• arithmetic 4 tasks — math benchmark, smoke-test fixture
• swebench-live 1895 tasks — continuously-refreshed GitHub-issue
resolution, contamination-resistant
• swebench-verified 500 tasks — Princeton + OpenAI human-validated
subset
• terminalbench2 89 tasks — Laude Institute terminal tasks
(16 categories, pytest-validated)
Each entry passed quick_check.py --no-install locally with the cube
package installed editable from cube-harness/cubes/<name>; the script's
write-back populated task_count, has_debug_task, has_debug_agent,
features, action_space, status, resources without further hand-editing.
Author handles are the actual cube wrapper maintainers (per git log on
each cubes/<name>/ subtree):
• arithmetic NicolasAG, recursix
• swebench-live NicolasAG, recursix, josancamon19
• swebench-verified NicolasAG, recursix, josancamon19
• terminalbench2 recursix
Other cubes investigated in the same probe but NOT included here:
• browsercomp — module-load fails (BrowseCompBenchmarkConfig
requires scorer_model that the default doesn't
pass). Filed cube-harness#479 against younik
+ NicolasAG.
• windows-agent-arena — task_metadata count (152) != declared num_tasks
(154). Filed cube-harness#480 against
kushasareen + amanjaiswal73892.
• webarena-verified — already registered; module-load now fails after
a BgymToolConfig → ToolboxConfig rename that
didn't propagate to the cube's configs.py. Will
surface on next periodic health-check.
• terminalbench-cube — empty directory in cube-harness (build artefacts
only); dev branch doesn't track any source.
Apparent leftover from a tb → tb2 rename.
Drive-by fix in scripts/quick_check.py: the script calls
importlib.metadata.entry_points(...) at lines 122 + 167 but only does
import importlib at line 21. In Python ≥ 3.10 importlib.metadata is a
submodule that needs an explicit import; without it the try/except at
find_benchmark_class:121 catches an AttributeError ("module 'importlib'
has no attribute 'metadata'") and silently degrades to the by-name lookup
path. Added one line: import importlib.metadata. Pure correctness fix;
the by-name fallback path remains as defense in depth.
Companion issues for the cubes that didn't make this batch:
• The-AI-Alliance/cube-harness#479 (browsercomp)
• The-AI-Alliance/cube-harness#480 (windows-agent-arena)
Signed-off-by: Alexandre Lacoste <alex.lacoste.shmu@gmail.com>1 parent 5a18b6d commit 3d8ce96
5 files changed
Lines changed: 167 additions & 0 deletions
File tree
- entries
- scripts
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
0 commit comments