ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

📜 Overview

We introduce ReasoningBank, a memory mechanism for agents that learns from both successful and failed trajectories, with reasoning stored as memory content.

Building upon this memory formulation, we propose memory-aware test-time scaling, which leverages the bidirectional synergy between memory and test-time scaling, establishing experience-driven memory as another scaling dimension for agent systems.

📂 Code Setup

We release code for SWE-Bench (software engineering) and WebArena (web-browsing), as in corresponding directories.

Before we start, please install required packages by running pip install -r requirements.txt.

0. LLM Configuration

Currently we support three model families:

GPT: To use GPT models (gpt-3.5-turbo, gpt-4, gpt-4o), you need to set your OpenAI API key as an environment variable:
```
export OPENAI_API_KEY="your-openai-api-key"
```
Gemini & Claude: To use Gemini models (gemini-2.5-flash, gemini-2.5-pro) or Claude (claude-3-7-sonnet@20250219) on Vertex AI, you need to configure Google Cloud authentication.
1. Install the Google Cloud CLI and log in to set up Application Default Credentials (ADC):
```
gcloud auth application-default login
```
2. Set your project and location as environment variables, as they are required by clients like the one for Claude on Vertex AI:
```
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_CLOUD_LOCATION="your-region"
export GOOGLE_GENAI_USE_VERTEXAI="True"
```

1. WebArena

Docker Configuration

Make sure to correctly install browsergym following the official documentation.

The next step is to download and config docker environment for WebArena. Please refer to this tutorial, executing the scripts follow the numerical order of file names. Before executing, make sure to config the address of each website in corresponding scripts as instructed correspondingly.

Directory Structure

WebArena/agents/: implementation for web agents integrating with browsergym
WebArena/autoeval/: llm-as-a-judge for obtaining correctness signal for trajectories
WebArena/config_files/: data processing for webarena tasks
WebArena/prompt/: instructions used across the implementation

Data preprocessing

Download raw test files from here and put it to config_files. The repo also vendors a patched copy at third_party/webarena/test.raw.json with shopping-split annotation corrections; use either one.

Run generate_config_files.py to process raw test data to config files as input.

Use the vendored `webarena` tree

The repo ships a patched webarena/ harness at third_party/webarena/ (corrected shopping annotations, wishlist eval fix, fill('','') guard, retry_with_force=True clicks) to make environment and corresponding evaluation more robust and stable. Prepend it to PYTHONPATH so it shadows the pip-installed browsergym.webarena:

export PYTHONPATH="$(pwd)/third_party:$PYTHONPATH"

webarena is a namespace package, so no code edits are required — every webarena.* submodule resolves to the vendored copy.

Run the code

Run directly with ReasoningBank: bash run.sh, config model, output_dir, and website, and memory_mode accordingly.

To run with scaling setting, please refer to pipeline_scaling.py and induce_scaling.py.

2. SWE-Bench

We built upon mini-swe-agent. First, install it from source by pip install -e . under the directory of ./third_party This will install the dependencies as specified in pyproject.toml.

The script SWE-Bench/run.sh provides direct running command, which will generate result files in the output directory. Before running, make sure the configuration for VertexAI is properly configured as instructed in run.sh.

For evaluation, please refer to sb-cli command in the official documentation.

Acknowledgement

We adopt code from the following code repositories. We sincerely appreciate these great work/codebases:

📚 Citation

If you find this work useful, please kindly cite our paper:

@inproceedings{
  ouyang2026reasoningbank,
  title={ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory},
  author={Siru Ouyang and Jun Yan and I-Hung Hsu and Yanfei Chen and Ke Jiang and Zifeng Wang and Rujun Han and Long Le and Samira Daruki and Xiangru Tang and Vishy Tirumalashetty and George Lee and Mahsan Rofouei and Hangfei Lin and Jiawei Han and Chen-Yu Lee and Tomas Pfister},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=jL7fwchScm}
}

Disclaimer

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

This project is intended for demonstration purposes only. It is not intended for use in a production environment.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
SWE-Bench		SWE-Bench
WebArena		WebArena
assets		assets
third_party		third_party
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

📜 Overview

📂 Code Setup

0. LLM Configuration

1. WebArena

Docker Configuration

Directory Structure

Data preprocessing

Use the vendored `webarena` tree

Run the code

2. SWE-Bench

Acknowledgement

📚 Citation

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

📜 Overview

📂 Code Setup

0. LLM Configuration

1. WebArena

Docker Configuration

Directory Structure

Data preprocessing

Use the vendored webarena tree

Run the code

2. SWE-Bench

Acknowledgement

📚 Citation

Disclaimer

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Use the vendored `webarena` tree

Packages