Official code release for Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation, accepted for publication at CAIN 2026.
This repo provides a CLI that rewrites SWE-Bench prompts using an LLM and saves a dataset which can be used downstream for agent inference.
git clone https://github.com/microsoft/swebench-mutate.git
cd swebench-mutate
make setup # Creates venv, installs deps, prompts for Azure OpenAI credentials
make run # Runs the example scriptBy default, make will setup the project, then run the example command.
Run make help to see all available commands.
# Clone the repository
git clone https://github.com/microsoft/swebench-mutate.git
cd swebench-mutate
# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install the package
pip install -e .
# For development (includes linting and testing tools)
pip install -e ".[dev,test]"Before running the example script, configure your Azure OpenAI credentials:
# Copy the template and edit with your values
cp .env.template .env
# Edit .env with your Azure OpenAI endpoint and authenticationWe use Azure OpenAI through LiteLLM by default. To use alternative LLM providers, see the LiteLLM docs.
Required environment variables:
AZURE_API_BASE: Your Azure OpenAI resource endpoint (e.g.,https://your-resource.openai.azure.com/)AZURE_API_VERSION: API versionAZURE_OPENAI_API_KEY: Your API key
script/run_example.sh runs the CLI on 5 prompts.
Run swebench-mutate --help for instructions on running the standalone Python CLI.
The default configuration is example.yaml. Additional LiteLLM arguments can be passed by configuring additional_args with litellm.completion arguments or through environment variables.
See prompt_customization.py for the anonymized mutation prompts used to mutate SWE-Bench prompts.
If you use SWE-Bench-Mutated, please cite:
@misc{garg2025savingswebenchbenchmarkmutation,
title={Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation},
author={Spandan Garg and Benjamin Steenhoek and Yufan Huang},
year={2025},
eprint={2510.08996},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2510.08996},
}
See CONTRIBUTING.md for guidelines.
This repository is released under the MIT License (see LICENSE).
Security reporting information is in SECURITY.md.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.