This directory contains an integration to train a coding agent on the SWE-Bench task using Mini-SWE-Agent and SkyRL.
To start training, follow three simple steps:
- Prepare the SWE-Gym dataset.
- Configure your environment backend (Podman).
- Launch training!
Start by following the SkyRL installation instructions:
cd SkyRL/The Mini-SWE-Agent integration implements a custom MiniSweAgentGenerator that uses Mini-SWE-Agent to generate trajectories for SWE-Bench instances. The workflow consists of:
- Generation: Initialize a sandbox environment and generate a trajectory using Mini-SWE-Agent configured with SkyRL's HTTP endpoint, producing a git patch.
- Evaluation: Apply the generated patch to a fresh environment and run the evaluation script to determine if the instance was resolved.
We launch a Ray task per trajectory to scale this across all nodes in the cluster.
We use SWE-Gym, specifically the subset from SumanthRH/SWE-Gym-Subset.
Execute the following command:
uv run --isolated examples/mini_swe_agent/preprocess_swegym.py --output_dir ~/data/swe_gym_subset # or modify to your desired pathPrerequisites: Install the required environment backend. By default, we use Podman. This can be modified in examples/mini_swe_agent/swebench.yaml.
We provide example scripts for different model sizes:
Qwen3-8B (requires 1x 8xH100 node):
bash examples/mini_swe_agent/run_mini_swe_8B.shQwen3-Coder-30B (requires 2x 8xH100 nodes):
bash examples/mini_swe_agent/run_mini_swe_30B.shMake sure to update the DATA_DIR variable in the bash script if you saved the data to a custom path.
All training parameters can be modified in the run scripts, such as model choice, GRPO group size, or training batch size.
For issues with SkyRL or the Mini-SWE-Agent integration, please open an Issue.
-
Context length errors: If you see
ValueError: The decoder prompt (length xxxx) is longer than the maximum model length, increase the vLLMengine_init_kwargs.max_model_len, reducemax_input_length, or reduce steps inswebench.yaml.max_generate_lengthis the assistant-token budget for a trajectory and does not increase the model context window. -
All zero rewards: If rewards are consistently zero, the task may be too difficult. Consider:
- Filtering data for a better mix of easy/hard samples
- Using a stronger base model
- Increasing
step_limitinswebench.yaml
-
Argument list too long: For very large git patches, you might notice evaluation errors such as
Argument list too long: 'podman'. This is because we apply the model's git patch by passing it as a CLI argument, and for large patches, you can hit the system'sARG_MAXlimits. On modern systems, this limit is about ~1MB. We make a simple assumption that such large patches are meant to be incorrect. -
Podman UID errors: If running podman within a container, you might hit errors due to insufficient UIDs. To resolve this, you have two options on Linux-based machines:
- Edit the
/etc/subuidand/etc/subgidfiles to use a larger range of UIDs, like100000-1100000 - Set
ignore_chown_errors=truein Podman's containers.conf
- Edit the
Beyond the configuration for SkyRL in the training script, the task-specific configuration file is examples/mini_swe_agent/swebench.yaml, which controls:
- Environment backend settings
- Step limits for agent execution
- Tool configurations for Mini-SWE-Agent
For more details, refer to the documentation.