Skip to content

chore(webarena-verified): bump registry entry to 1.1.0#64

Merged
recursix merged 2 commits into
mainfrom
chore/webarena-version-1.1.0
Jun 19, 2026
Merged

chore(webarena-verified): bump registry entry to 1.1.0#64
recursix merged 2 commits into
mainfrom
chore/webarena-version-1.1.0

Conversation

@recursix

Copy link
Copy Markdown
Collaborator

Syncs the registry entry version to the latest published webarena-verified-cube release (1.1.0 on PyPI; the entry was at 1.0.0). The schema requires version to match the published PyPI version, and quick-check installs package==version. Validated against registry-schema.json locally.

Sync the registry entry to the latest published webarena-verified-cube release
(1.1.0 on PyPI). The version field must match the published PyPI version.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Cube Harness <cube-harness@example.com>
@recursix

Copy link
Copy Markdown
Collaborator Author

⚠️ Not merging — quick-compliance fails, but not because of the version bump. It surfaced a real problem:

webarena-verified-cube==1.1.0 installs but fails to import the benchmark:
```
tool_config
Input should be a valid dictionary or instance of ToolboxConfig
[input_value=BgymToolConfig(...), input_type=BgymToolConfig]
```
The cube's tool_config default is a ToolboxConfig (correct), but the Playwright tool it wraps resolves — via cube-browser-tool 0.3.0 (PyPI) — to a BgymToolConfig that cube-standard rc10's ToolboxConfig validation rejects. It's a cross-package version skew (both deps published 2026-06-16), affecting any fresh pip install, and the same cube code is in 1.0.0 — so the entry was likely already silently non-compliant.

This is a packaging/compat fix in cube-browser-tool ↔ cube-standard + a cube republish, not a registry change. Holding this PR until a fresh pip install webarena-verified-cube imports cleanly; then the bump goes green.

…he default-config import fix)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Cube Harness <cube-harness@example.com>
recursix pushed a commit that referenced this pull request Jun 19, 2026
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Cube Harness <cube-harness@example.com>
recursix pushed a commit that referenced this pull request Jun 19, 2026
…a 1.1.0)

Sync displayed version to the published cubes. (webarena handled separately in #64.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Cube Harness <cube-harness@example.com>
@recursix recursix merged commit b64e9a4 into main Jun 19, 2026
8 of 9 checks passed
@recursix recursix deleted the chore/webarena-version-1.1.0 branch June 19, 2026 12:13
recursix added a commit that referenced this pull request Jun 19, 2026
…a 1.1.0) (#66)

Sync displayed version to the published cubes. (webarena handled separately in #64.)

Signed-off-by: Cube Harness <cube-harness@example.com>
Co-authored-by: Cube Harness <cube-harness@example.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Entry review — webarena-verified

Verdict: PASS

Check Result
description_matches_package ✅ pass
authors_consistent_with_git ⚠️ unverified
no_id_squat_vs_existing ✅ pass
no_brand_impersonation ✅ pass
wrapper_license_plausible ✅ pass

Notes:

Entry is for an update/existing id webarena-verified (already in registry IDs list), so not a new-id squat. README confirms cube-harness wraps WebArena among supported benchmarks ("MiniWob, WebArena, OSWorld") — description of 812 tasks / 6 web platforms is consistent with the published WebArena benchmark (arxiv 2406.11955 is the linked paper). No brand impersonation: WebArena Verified is a faithful-port style name; benchmark_license source_url points to ServiceNow/webarena-verified, plausible upstream.

Could not verify: PyPI page empty (package may be unpublished — no PyPI long description or license field to cross-check). Author git history for the cubes/webarena-verified subdirectory not directly readable here; handles (NicolasAG, younik, recursix, manuel-delverme) are plausible ServiceNow/cube-harness contributors but unverified against commit history. known-authors.yaml has no entry for this id.

wrapper_license MIT is a valid SPDX id; benchmark_license reported Apache-2.0 with verified_by_original_authors:false — could not fetch the linked LICENSE file but the claim is internally consistent. No fail-level evidence; unverified items do not block judging the core claims (this is an existing-id update with matching scope).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant