Skip to content

google-research/reasoning-bank

Repository files navigation

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

📜 Overview

intro_case

We introduce ReasoningBank, a memory mechanism for agents that learns from both successful and failed trajectories, with reasoning stored as memory content.

intro_case

Building upon this memory formulation, we propose memory-aware test-time scaling, which leverages the bidirectional synergy between memory and test-time scaling, establishing experience-driven memory as another scaling dimension for agent systems.

📂 Code Setup

We release code for SWE-Bench (software engineering) and WebArena (web-browsing), as in corresponding directories.

Before we start, please install required packages by running pip install -r requirements.txt.

0. LLM Configuration

Currently we support three model families:

  • GPT: To use GPT models (gpt-3.5-turbo, gpt-4, gpt-4o), you need to set your OpenAI API key as an environment variable:

    export OPENAI_API_KEY="your-openai-api-key"
  • Gemini & Claude: To use Gemini models (gemini-2.5-flash, gemini-2.5-pro) or Claude (claude-3-7-sonnet@20250219) on Vertex AI, you need to configure Google Cloud authentication.

    1. Install the Google Cloud CLI and log in to set up Application Default Credentials (ADC):
      gcloud auth application-default login
    2. Set your project and location as environment variables, as they are required by clients like the one for Claude on Vertex AI:
      export GOOGLE_CLOUD_PROJECT="your-project-id"
      export GOOGLE_CLOUD_LOCATION="your-region"
      export GOOGLE_GENAI_USE_VERTEXAI="True"

1. WebArena

Docker Configuration

Make sure to correctly install browsergym following the official documentation.

The next step is to download and config docker environment for WebArena. Please refer to this tutorial, executing the scripts follow the numerical order of file names. Before executing, make sure to config the address of each website in corresponding scripts as instructed correspondingly.

Directory Structure

  • WebArena/agents/: implementation for web agents integrating with browsergym
  • WebArena/autoeval/: llm-as-a-judge for obtaining correctness signal for trajectories
  • WebArena/config_files/: data processing for webarena tasks
  • WebArena/prompt/: instructions used across the implementation

Data preprocessing

Download raw test files from here and put it to config_files. The repo also vendors a patched copy at third_party/webarena/test.raw.json with shopping-split annotation corrections; use either one.

Run generate_config_files.py to process raw test data to config files as input.

Use the vendored webarena tree

The repo ships a patched webarena/ harness at third_party/webarena/ (corrected shopping annotations, wishlist eval fix, fill('','') guard, retry_with_force=True clicks) to make environment and corresponding evaluation more robust and stable. Prepend it to PYTHONPATH so it shadows the pip-installed browsergym.webarena:

export PYTHONPATH="$(pwd)/third_party:$PYTHONPATH"

webarena is a namespace package, so no code edits are required — every webarena.* submodule resolves to the vendored copy.

Run the code

Run directly with ReasoningBank: bash run.sh, config model, output_dir, and website, and memory_mode accordingly.

To run with scaling setting, please refer to pipeline_scaling.py and induce_scaling.py.

2. SWE-Bench

We built upon mini-swe-agent. First, install it from source by pip install -e . under the directory of ./third_party This will install the dependencies as specified in pyproject.toml.

The script SWE-Bench/run.sh provides direct running command, which will generate result files in the output directory. Before running, make sure the configuration for VertexAI is properly configured as instructed in run.sh.

For evaluation, please refer to sb-cli command in the official documentation.

Acknowledgement

We adopt code from the following code repositories. We sincerely appreciate these great work/codebases:

📚 Citation

If you find this work useful, please kindly cite our paper:

@inproceedings{
  ouyang2026reasoningbank,
  title={ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory},
  author={Siru Ouyang and Jun Yan and I-Hung Hsu and Yanfei Chen and Ke Jiang and Zifeng Wang and Rujun Han and Long Le and Samira Daruki and Xiangru Tang and Vishy Tirumalashetty and George Lee and Mahsan Rofouei and Hangfei Lin and Jiawei Han and Chen-Yu Lee and Tomas Pfister},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=jL7fwchScm}
}

Disclaimer

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

This project is intended for demonstration purposes only. It is not intended for use in a production environment.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors