Skip to content

The official implementation of "EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis".

License

Notifications You must be signed in to change notification settings

RUC-NLPIR/EnvScaler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

Arxiv Β  Hugging Face Models Β  Hugging Face Datasets Β  License Β  Python 3.10+
If you like our project, please give us a star ⭐ on GitHub. We greatly appreciate your support.

🎬 Demo

Env-Agent-User Interaction

conversation.mp4

Env-Agent Interaction

non_conversation.mp4

Building Environment From Scratch

env_generation.mp4

To locally run the demo that interacting with Envs:

cd interact_with_env
python app.py

To locally run the demo that builing Envs from scratch:

cd skel_builder
python env_build_demo.py

πŸ“¦ Dataset & Models

We provide EnvScaler’s data and models (after SFT+RL) as follows:

Data Link
191 Env Metadata πŸ€— HuggingFace
4.7K SFT Scenario πŸ€— HuggingFace
2.5K RL Scenario πŸ€— HuggingFace
9K SFT Trajectory πŸ€— HuggingFace
Model Link
EnvScaler-Qwen3-1.7B πŸ€— HuggingFace
EnvScaler-Qwen3-4B πŸ€— HuggingFace
EnvScaler-Qwen3-8B πŸ€— HuggingFace

πŸ“‘ Contents

πŸ‘€ Overview

EnvScaler is an automated, scalable framework that realizes executable, stateful, tool-interactive environments via programmatic synthesis, for training LLM agents.


Overview of EnvScaler.

SkelBuilder is the first stage of EnvScaler. It (1) mines potential Env descriptions from existing open-source textual tasks; (2) plans the corresponding state schema and business rules, and generates a fully-functional Python class whose methods expose tool interfaces; (3) performs a dual-agent loop for Env quality inspection (one agent invokes tools, the other checks code, return values, and state changes), guaranteeing quality and consistency.


Framework of SkelBuilder.

ScenGenerator is the second stage for synthesizing multiple Env scenarios. Given an Env skeleton, it first prompts LLMs to generate an initial state/database, then creates a challenging task that can be solved from that state. Finally, it decomposes the task into checklists, and converts each checkpoint into a Python Boolean function over the final state of the Env, providing rule-based, verifiable reward signals.


Framework of ScenGenerator.

πŸ“Š Results

With EnvScaler, we synthesized 191 environments and about 7K scenarios, and applied them to Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for Qwen3 series models. Results on three benchmarks show that EnvScaler significantly improves LLMs' ability to solve tasks in complex environments involving multiturn, multi-tool interactions.


Statistics of 191 synthesized environments.


Performance comparison.

πŸ“ Project Structure

EnvScaler/
β”œβ”€β”€ skel_builder/              # Stage 1: Env Skeleton Construction
β”œβ”€β”€ scen_generator/            # Stage 2: Scenario Generation
β”œβ”€β”€ interact_with_env/         # Agent-Env Interaction
β”œβ”€β”€ sft/                       # Supervised Fine-Tuning (SFT)
β”œβ”€β”€ rl/                        # Reinforcement Learning (RL)
└── evaluation/                # Evaluation Guide

Module Description

πŸ’‘ Tip: We provide detailed documentation under each module.

  1. skel_builder/ – Env skeleton construction framework that automatically generates executable environment classes from existing tasks.
  2. scen_generator/ – Scenario generation framework that produces state data, task scenarios, and checkpoint functions for an Env skeleton.
  3. interact_with_env/ – Agent-Env interaction module supporting (1) collecting training data by interacting with synthesized Envs and (2) benchmark evaluation.
  4. sft/ – Supervised fine-tuning implementation based on LlamaFactory.
  5. rl/ – Reinforcement learning implementation based on the ROLL framework.
  6. evaluation/ – Evaluation guide including BFCL, TauBench, and ACEBench.

πŸš€ Quick Start

1. Clone the repository

git clone https://github.com/RUC-NLPIR/EnvScaler 
cd EnvScaler

2. Install dependencies

pip install -r requirements.txt

πŸ’‘ Note: Basic dependencies are included in requirements.txt. If you need SFT or RL training, please install extra dependencies following the corresponding sub-project documentation:

  • SFT training: refer to sft/README.md to install LlamaFactory
  • RL training: refer to rl/README.md to install the ROLL framework

3. Configure LLM service

Option 1: Use OpenAI API

Create a .env file in the project root and configure your OpenAI API key:

# .env
OPENAI_API_KEY=your-openai-api-key-here
OPENAI_BASE_URL=https://api.openai.com/v1 

Option 2: Use self-hosted model

You can deploy a local model with an OpenAI-compatible inference framework such as vLLM.

Deploy a model with vLLM:

vllm serve your-model-path \
    --host 0.0.0.0 \
    --port 8000 \
    --trust-remote-code

⚠️ Important: Ensure the deployed model service supports Function Calling (FC) interface, see vLLM OpenAI-Compatible Server docs for details.

4. Verify configuration

Run the demo to verify your setup:

# Environment interaction demo
cd interact_with_env
python app.py

# Environment interaction Debug
cd interact_with_env
python run_main_debug.py

# Environment building demo
cd skel_builder
python env_build_demo.py

5. Start using

Now you can use each module of EnvScaler independently:

πŸ“š Citation

If you find our work helpful, please consider citing it. We greatly appreciate your support.

@misc{song2026envscalerscalingtoolinteractiveenvironments,
      title={EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis}, 
      author={Xiaoshuai Song and Haofei Chang and Guanting Dong and Yutao Zhu and Zhicheng Dou and Ji-Rong Wen},
      year={2026},
      eprint={2601.05808},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.05808}, 
}

πŸ“ž Contact

For any questions or feedback, please reach out to us at [email protected].

About

The official implementation of "EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis".

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages