SWE-Bench-Mutated

Official code release for Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation, accepted for publication at CAIN 2026.

This repo provides a CLI that rewrites SWE-Bench prompts using an LLM and saves a dataset which can be used downstream for agent inference.

Quick Start

git clone https://github.com/microsoft/swebench-mutate.git
cd swebench-mutate
make setup    # Creates venv, installs deps, prompts for Azure OpenAI credentials
make run      # Runs the example script

By default, make will setup the project, then run the example command. Run make help to see all available commands.

Installation

# Clone the repository
git clone https://github.com/microsoft/swebench-mutate.git
cd swebench-mutate

# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install the package
pip install -e .

# For development (includes linting and testing tools)
pip install -e ".[dev,test]"

Environment Setup

Before running the example script, configure your Azure OpenAI credentials:

# Copy the template and edit with your values
cp .env.template .env

# Edit .env with your Azure OpenAI endpoint and authentication

We use Azure OpenAI through LiteLLM by default. To use alternative LLM providers, see the LiteLLM docs.

Required environment variables:

AZURE_API_BASE: Your Azure OpenAI resource endpoint (e.g., https://your-resource.openai.azure.com/)
AZURE_API_VERSION: API version
AZURE_OPENAI_API_KEY: Your API key

Usage

script/run_example.sh runs the CLI on 5 prompts.

Run swebench-mutate --help for instructions on running the standalone Python CLI.

The default configuration is example.yaml. Additional LiteLLM arguments can be passed by configuring additional_args with litellm.completion arguments or through environment variables.

Prompts

See prompt_customization.py for the anonymized mutation prompts used to mutate SWE-Bench prompts.

Citation

If you use SWE-Bench-Mutated, please cite:

@misc{garg2025savingswebenchbenchmarkmutation,
      title={Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation}, 
      author={Spandan Garg and Benjamin Steenhoek and Yufan Huang},
      year={2025},
      eprint={2510.08996},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2510.08996}, 
}

Contributing

See CONTRIBUTING.md for guidelines.

License

This repository is released under the MIT License (see LICENSE).

Security

Security reporting information is in SECURITY.md.

Trademark Notice

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
.vscode		.vscode
configs		configs
script		script
src/swebench_mutate		src/swebench_mutate
tests		tests
.env.template		.env.template
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SWE-Bench-Mutated

Quick Start

Installation

Environment Setup

Usage

Prompts

Citation

Contributing

License

Security

Trademark Notice

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SWE-Bench-Mutated

Quick Start

Installation

Environment Setup

Usage

Prompts

Citation

Contributing

License

Security

Trademark Notice

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages