Skip to content

Example for training on SWE (agentic software-engineering) tasks? #9508

@dipta007

Description

@dipta007

Checklist / 检查清单

  • I have searched existing issues, and this is a new question or discussion topic. / 我已经搜索过现有的 issues,确认这是一个新的问题与讨论。

Question Description / 问题描述

Is there an example/recipe for grpo training on multi-turn SWE-style tasks (SWE-bench / SWE-Gym)?

Looking for:

  • Multi-turn agentic rollouts with tool calls (shell, file edits, tests)
  • How environment/reward is wired (e.g. tests-passing as reward)
  • Setup for long trajectories / large context
  • Setup for how docker/podman is handled efficiently during rollout

If nothing SWE-specific exists, a pointer to the closest multi-turn example to adapt would help. Happy to contribute one back.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions