Example for training on SWE (agentic software-engineering) tasks?

### Checklist / 检查清单

- [x] I have searched existing issues, and this is a new question or discussion topic. / 我已经搜索过现有的 issues，确认这是一个新的问题与讨论。

### Question Description / 问题描述

Is there an example/recipe for grpo training on multi-turn SWE-style tasks (SWE-bench / SWE-Gym)?

Looking for:
- Multi-turn agentic rollouts with tool calls (shell, file edits, tests)
- How environment/reward is wired (e.g. tests-passing as reward)
- Setup for long trajectories / large context
- Setup for how docker/podman is handled efficiently during rollout

If nothing SWE-specific exists, a pointer to the closest multi-turn example to adapt would help. Happy to contribute one back.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example for training on SWE (agentic software-engineering) tasks? #9508

Checklist / 检查清单

Question Description / 问题描述

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Example for training on SWE (agentic software-engineering) tasks? #9508

Description

Checklist / 检查清单

Question Description / 问题描述

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions