Open
Description
Hi @brutalsavage,
Thanks for the amazing work and for open-sourcing the repo — it’s been incredibly helpful for reproducing SWE-Bench.
I recently ran Agentless experiments on SWE-Bench-Lite following your instructions, using all four localization files.
Here are my results:
All instances run.
Cleaning cached images...
Removed 0 images.
Total instances: 300
Instances submitted: 300
Instances completed: 297
Instances incomplete: 0
Instances resolved: 82
Instances unresolved: 215
Instances with empty patches: 3
Instances with errors: 0
Unstopped containers: 0
Unremoved images: 300
I got 82/300 resolved, which is lower than the 96/300 reported in the paper. I'm wondering:
-
What's the key difference between Agentless 1.5 and Agentless 1.0?
-
Does Agentless 1.5 use gpt-4o-2024-05-13, or a more recent version of GPT-4o?
Thanks again for the great work!
Metadata
Metadata
Assignees
Labels
No labels