Checklist / 检查清单
Question Description / 问题描述
Is there an example/recipe for grpo training on multi-turn SWE-style tasks (SWE-bench / SWE-Gym)?
Looking for:
- Multi-turn agentic rollouts with tool calls (shell, file edits, tests)
- How environment/reward is wired (e.g. tests-passing as reward)
- Setup for long trajectories / large context
- Setup for how docker/podman is handled efficiently during rollout
If nothing SWE-specific exists, a pointer to the closest multi-turn example to adapt would help. Happy to contribute one back.
Thanks!
Checklist / 检查清单
Question Description / 问题描述
Is there an example/recipe for grpo training on multi-turn SWE-style tasks (SWE-bench / SWE-Gym)?
Looking for:
If nothing SWE-specific exists, a pointer to the closest multi-turn example to adapt would help. Happy to contribute one back.
Thanks!