We introduce ReasoningBank, a memory mechanism for agents that learns from both successful and failed trajectories, with reasoning stored as memory content.
Building upon this memory formulation, we propose memory-aware test-time scaling, which leverages the bidirectional synergy between memory and test-time scaling, establishing experience-driven memory as another scaling dimension for agent systems.
We release code for SWE-Bench (software engineering) and WebArena (web-browsing),
as in corresponding directories.
Before we start, please install required packages by running pip install -r requirements.txt.
Currently we support three model families:
-
GPT: To use GPT models (
gpt-3.5-turbo,gpt-4,gpt-4o), you need to set your OpenAI API key as an environment variable:export OPENAI_API_KEY="your-openai-api-key"
-
Gemini & Claude: To use Gemini models (
gemini-2.5-flash,gemini-2.5-pro) or Claude (claude-3-7-sonnet@20250219) on Vertex AI, you need to configure Google Cloud authentication.- Install the Google Cloud CLI and log in to set up Application Default Credentials (ADC):
gcloud auth application-default login
- Set your project and location as environment variables, as they are required by clients like the one for Claude on Vertex AI:
export GOOGLE_CLOUD_PROJECT="your-project-id" export GOOGLE_CLOUD_LOCATION="your-region" export GOOGLE_GENAI_USE_VERTEXAI="True"
- Install the Google Cloud CLI and log in to set up Application Default Credentials (ADC):
Make sure to correctly install browsergym following the official documentation.
The next step is to download and config docker environment for WebArena. Please refer to this tutorial, executing the scripts follow the numerical order of file names. Before executing, make sure to config the address of each website in corresponding scripts as instructed correspondingly.
WebArena/agents/: implementation for web agents integrating with browsergymWebArena/autoeval/: llm-as-a-judge for obtaining correctness signal for trajectoriesWebArena/config_files/: data processing for webarena tasksWebArena/prompt/: instructions used across the implementation
Download raw test files from here and put it to config_files. The repo also vendors a patched copy at third_party/webarena/test.raw.json with shopping-split annotation corrections; use either one.
Run generate_config_files.py to process raw test data to config files as input.
The repo ships a patched webarena/ harness at third_party/webarena/ (corrected shopping annotations, wishlist eval fix, fill('','') guard, retry_with_force=True clicks) to make environment and corresponding evaluation more robust and stable. Prepend it to PYTHONPATH so it shadows the pip-installed browsergym.webarena:
export PYTHONPATH="$(pwd)/third_party:$PYTHONPATH"webarena is a namespace package, so no code edits are required — every webarena.* submodule resolves to the vendored copy.
Run directly with ReasoningBank: bash run.sh, config model, output_dir, and
website, and memory_mode accordingly.
To run with scaling setting, please refer to
pipeline_scaling.py and induce_scaling.py.
We built upon mini-swe-agent. First, install it from source by pip install -e . under the directory of ./third_party This will install the dependencies as specified in pyproject.toml.
The script SWE-Bench/run.sh provides direct running command, which will generate
result files in the output directory. Before running, make sure the
configuration for VertexAI is properly configured as instructed in run.sh.
For evaluation, please refer to sb-cli command in the official documentation.
We adopt code from the following code repositories. We sincerely appreciate these great work/codebases:
If you find this work useful, please kindly cite our paper:
@inproceedings{
ouyang2026reasoningbank,
title={ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory},
author={Siru Ouyang and Jun Yan and I-Hung Hsu and Yanfei Chen and Ke Jiang and Zifeng Wang and Rujun Han and Long Le and Samira Daruki and Xiangru Tang and Vishy Tirumalashetty and George Lee and Mahsan Rofouei and Hangfei Lin and Jiawei Han and Chen-Yu Lee and Tomas Pfister},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=jL7fwchScm}
}
This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.
This project is intended for demonstration purposes only. It is not intended for use in a production environment.

