feat: add swegym entry (full SWE-Gym 2438 + official lite subset)#63
Conversation
Same-repo PR (not a fork) so quick-compliance can write CI-derived fields back — the fork-PR variant (#62) passes the compliance check but fails the write-back step (GITHUB_TOKEN can't push to a fork branch). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Cube Harness <cube-harness@example.com>
Entry review —
|
| Check | Result |
|---|---|
description_matches_package |
|
authors_consistent_with_git |
|
no_id_squat_vs_existing |
✅ pass |
no_brand_impersonation |
✅ pass |
wrapper_license_plausible |
Notes:
Core claims could not be verified. PyPI package swegym-cube appears unpublished: empty summary, empty license field, no long description. The dev_install_url points at a feature branch (cube-harness@feat/swegym-lite-cube#subdirectory=cubes/swegym-cube) and the entry's own NOTE says it is blocked until both a cube-harness PR merges and cube-standard rc9 lands on PyPI — so the package state is explicitly pre-merge/incomplete. No public README reachable for the cube wrapper repo, so description-vs-package consistency cannot be confirmed.
authors: only author is recursix (Alexandre Lacoste). The cube subdirectory (cubes/swegym-cube) is on a private/unreachable feature branch; could not confirm this handle appears in commit history for the wrapper. recursix is not in known-authors.yaml for this id. Unverified.
wrapper_license: Apache-2.0 is a valid SPDX id, but the cube-harness repo LICENSE was not reachable to confirm. Upstream SWE-Gym benchmark_license reported Apache-2.0 with source_url to SWE-Gym/SWE-Gym/blob/main/LICENSE — plausible but not fetched/confirmed here.
id swegym already exists in the registry; this PR modifies an existing entry (own-entry ownership presumed verified by CI), so not an id-squat. name/description faithfully describe the real SWE-Gym (arXiv 2412.21139), no brand impersonation.
Verdict CONCERN driven by the pattern of unverifiable core claims: unpublished package + unreachable repo block confirmation of description, authorship, and license. Recommend re-review once swegym-cube is published to PyPI and the cube-harness wrapper lands on a public default branch.
|
✅ ownership-check + quick-compliance passed (slow-compliance is informational). Auto-merge skipped because:
A maintainer will review and merge. |
…ged) The cube now lives on cube-harness dev (default branch); drop the dead @feat/swegym-lite-cube ref. cube-standard rc9/rc10 are on PyPI so the pip-based compliance check resolves the dependency cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Cube Harness <cube-harness@example.com>
Entry review —
|
| Check | Result |
|---|---|
description_matches_package |
✅ pass |
authors_consistent_with_git |
|
no_id_squat_vs_existing |
✅ pass |
no_brand_impersonation |
✅ pass |
wrapper_license_plausible |
✅ pass |
Notes:
Description: consistent with SWE-Gym (arxiv 2412.21139), a SWE-bench-methodology dataset of 2438 real-world Python SWE tasks across 11 repos with a 230-instance lite split. task_count 2438 matches. Docker-based fail_to_pass/pass_to_pass framing is correct for this benchmark family. Description plausible; cannot cross-check the swegym-cube subdir README directly but consistent with cube-harness repo structure.
ID squat: existing ids include swebench-live and swebench-verified; 'swegym' is a distinct benchmark (different dataset/authors than SWE-bench), not a confusable near-duplicate. Pass.
Brand impersonation: 'SWE-Gym' is the actual upstream name (SWE-Gym/SWE-Gym), faithful port — pass.
License: wrapper Apache-2.0 is a valid SPDX id; benchmark_license.reported Apache-2.0 with source_url to SWE-Gym/SWE-Gym/blob/main/LICENSE — plausible, verified_by_original_authors:false noted. Could not fetch the actual upstream LICENSE to confirm Apache-2.0, but claim is internally consistent. Pass on available evidence.
CONCERN drivers (not fails, but blocking):
- PyPI metadata for swegym-cube is EMPTY / package appears unpublished (dev_install_url is a git+ subdirectory URL, which may be acceptable, but no PyPI long description to cross-check description_matches_package against).
- authors_consistent_with_git UNVERIFIED: sole author 'recursix' (Alexandre Lacoste) is an AI Alliance / cube-harness maintainer, not a known SWE-Gym original author (known-authors.yaml lists none for this id). The linked repo README was readable but I could not inspect commit history for cubes/swegym-cube to confirm this handle contributed to that specific wrapper subdir. recursix is plausible as a cube-harness contributor (matches the harness authorship), so no impersonation evidence — but the original-author attribution for the SWE-Gym dataset itself is not represented.
Net: core claims look legitimate (faithful SWE-Gym port by a cube-harness maintainer), but empty PyPI page + inability to verify the wrapper author against the cube subdir commit history prevent a clean PASS. Recommend maintainer confirm (a) swegym-cube is actually published/installable and (b) recursix committed to cubes/swegym-cube.
|
✅ ownership-check + quick-compliance passed (slow-compliance is informational). Auto-merge skipped because:
A maintainer will review and merge. |
Adds the
swegymregistry entry — full SWE-Gym (2438 tasks) with the 230-tasklitesplit as an official named subset (cube-harness#504, packageswegym-cube).Submitted as a same-repo branch (I have admin) rather than a fork, so
quick-compliancecan push the CI-derived fields back — the fork variant (#62) passes compliance but fails the write-back step (GITHUB_TOKENcan't push to a fork branch). #62 will be closed as superseded.Validated locally + in CI on #62:
✅ Schema valid,✅ swegym-cube==0.1.0 installed,✅ SWEGymBenchmarkConfig,task_count 2438,has_debug_task True (2),✅ Quick check PASSED.Note:
dev_install_urlpins@feat/swegym-lite-cubeuntil cube-harness#504 merges; drop the ref (→ default branch) after merge. The cube depends oncube-standard>=0.1.0rc9(now on PyPI), which is what unblocked this.🤖 Generated with Claude Code