SkyRL/examples/train/mini_swe_agent/README.md at ca13c4b58453da7cda084acef834cafa3000b5f7 · NovaSky-AI/SkyRL

Guide: Mini-SWE-Agent + SkyRL

This directory contains an integration to train a coding agent on the SWE-Bench task using Mini-SWE-Agent and SkyRL.

To start training, follow three simple steps:

Prepare the SWE-Gym dataset.
Configure your environment backend (Podman).
Launch training!

Start by following the SkyRL installation instructions:

cd SkyRL/

How it works

The Mini-SWE-Agent integration implements a custom MiniSweAgentGenerator that uses Mini-SWE-Agent to generate trajectories for SWE-Bench instances. The workflow consists of:

Generation: Initialize a sandbox environment and generate a trajectory using Mini-SWE-Agent configured with SkyRL's HTTP endpoint, producing a git patch.
Evaluation: Apply the generated patch to a fresh environment and run the evaluation script to determine if the instance was resolved.

We launch a Ray task per trajectory to scale this across all nodes in the cluster.

1) Prepare the dataset

We use SWE-Gym, specifically the subset from SumanthRH/SWE-Gym-Subset.

Execute the following command:

uv run --isolated examples/mini_swe_agent/preprocess_swegym.py --output_dir ~/data/swe_gym_subset # or modify to your desired path

2) Configure environment backend

Prerequisites: Install the required environment backend. By default, we use Podman. This can be modified in examples/mini_swe_agent/swebench.yaml.

3) Launch training

We provide example scripts for different model sizes:

Qwen3-8B (requires 1x 8xH100 node):

bash examples/mini_swe_agent/run_mini_swe_8B.sh

Qwen3-Coder-30B (requires 2x 8xH100 nodes):

bash examples/mini_swe_agent/run_mini_swe_30B.sh

Make sure to update the DATA_DIR variable in the bash script if you saved the data to a custom path.

All training parameters can be modified in the run scripts, such as model choice, GRPO group size, or training batch size.

Troubleshooting

For issues with SkyRL or the Mini-SWE-Agent integration, please open an Issue.

Common Issues

Context length errors: If you see ValueError: The decoder prompt (length xxxx) is longer than the maximum model length, increase the vLLM engine_init_kwargs.max_model_len, reduce max_input_length, or reduce steps in swebench.yaml. max_generate_length is the assistant-token budget for a trajectory and does not increase the model context window.
All zero rewards: If rewards are consistently zero, the task may be too difficult. Consider:
- Filtering data for a better mix of easy/hard samples
- Using a stronger base model
- Increasing step_limit in swebench.yaml
Argument list too long: For very large git patches, you might notice evaluation errors such as Argument list too long: 'podman'. This is because we apply the model's git patch by passing it as a CLI argument, and for large patches, you can hit the system's ARG_MAX limits. On modern systems, this limit is about ~1MB. We make a simple assumption that such large patches are meant to be incorrect.
Podman UID errors: If running podman within a container, you might hit errors due to insufficient UIDs. To resolve this, you have two options on Linux-based machines:
1. Edit the /etc/subuid and /etc/subgid files to use a larger range of UIDs, like 100000-1100000
2. Set ignore_chown_errors=true in Podman's containers.conf

Configuration

Beyond the configuration for SkyRL in the training script, the task-specific configuration file is examples/mini_swe_agent/swebench.yaml, which controls:

Environment backend settings
Step limits for agent execution
Tool configurations for Mini-SWE-Agent

For more details, refer to the documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guide: Mini-SWE-Agent + SkyRL

How it works

1) Prepare the dataset

2) Configure environment backend

3) Launch training

Troubleshooting

Common Issues

Configuration

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Guide: Mini-SWE-Agent + SkyRL

How it works

1) Prepare the dataset

2) Configure environment backend

3) Launch training

Troubleshooting

Common Issues

Configuration