Skip to content

fix: make SWEbench live progress script restart-safe#872

Merged
piojanu merged 2 commits intomainfrom
fix/swebench-progress-dedup
Apr 16, 2026
Merged

fix: make SWEbench live progress script restart-safe#872
piojanu merged 2 commits intomainfrom
fix/swebench-progress-dedup

Conversation

@piojanu
Copy link
Copy Markdown
Contributor

@piojanu piojanu commented Mar 19, 2026

Summary

  • The tasks.jsonl progress tracking script counted raw lines, giving inflated totals (>500 for a 500-task benchmark) after SLURM wall-time restarts
  • Updated the script to deduplicate by task_id (last entry wins) so progress is accurate across multiple restarts
  • Added documentation explaining the append-only behavior of tasks.jsonl and when to use each progress file

Test plan

  • Run SWEbench eval, check progress mid-run with the new script
  • Kill and restart the eval, verify progress script still shows correct unique count (≤500)
  • Confirm previously-errored instances that succeed on retry show as success

🤖 Generated with Claude Code

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 19, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

The tasks.jsonl progress tracking script counted raw lines, giving
inflated totals (>500) after SLURM restarts. Deduplicate by task_id
(last entry wins) so progress is accurate across multiple restarts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Piotr Januszewski <pjanuszewski@nvidia.com>
@piojanu piojanu force-pushed the fix/swebench-progress-dedup branch from 5a63f96 to 1e6619e Compare April 16, 2026 09:42
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 16, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@piojanu piojanu marked this pull request as ready for review April 16, 2026 09:42
@piojanu piojanu requested review from a team as code owners April 16, 2026 09:42
@piojanu piojanu enabled auto-merge (squash) April 16, 2026 09:43
@marta-sd
Copy link
Copy Markdown
Contributor

/ok to test fa0f5bc

@chtruong814 chtruong814 added the docs-only With great power comes great responsibility. label Apr 16, 2026
@piojanu piojanu merged commit 4ecd187 into main Apr 16, 2026
51 of 58 checks passed
@piojanu piojanu deleted the fix/swebench-progress-dedup branch April 16, 2026 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-only With great power comes great responsibility.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants