EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

中文 | English

If you like our project, please give us a star ⭐ on GitHub. We greatly appreciate your support.

🎬 Demo

Env-Agent-User Interaction

conversation.mp4

Env-Agent Interaction

non_conversation.mp4

Building Environment From Scratch

env_generation.mp4

To locally run the demo that interacting with Envs:

cd interact_with_env
python app.py

To locally run the demo that builing Envs from scratch:

cd skel_builder
python env_build_demo.py

📦 Dataset & Models

We provide EnvScaler’s data and models (after SFT+RL) as follows:

Data	Link
191 Env Metadata	🤗 HuggingFace
4.7K SFT Scenario	🤗 HuggingFace
2.5K RL Scenario	🤗 HuggingFace
9K SFT Trajectory	🤗 HuggingFace

Model	Link
EnvScaler-Qwen3-1.7B	🤗 HuggingFace
EnvScaler-Qwen3-4B	🤗 HuggingFace
EnvScaler-Qwen3-8B	🤗 HuggingFace

📑 Contents

👀 Overview

EnvScaler is an automated, scalable framework that realizes executable, stateful, tool-interactive environments via programmatic synthesis, for training LLM agents.

Overview of EnvScaler.

SkelBuilder is the first stage of EnvScaler. It (1) mines potential Env descriptions from existing open-source textual tasks; (2) plans the corresponding state schema and business rules, and generates a fully-functional Python class whose methods expose tool interfaces; (3) performs a dual-agent loop for Env quality inspection (one agent invokes tools, the other checks code, return values, and state changes), guaranteeing quality and consistency.

Framework of SkelBuilder.

ScenGenerator is the second stage for synthesizing multiple Env scenarios. Given an Env skeleton, it first prompts LLMs to generate an initial state/database, then creates a challenging task that can be solved from that state. Finally, it decomposes the task into checklists, and converts each checkpoint into a Python Boolean function over the final state of the Env, providing rule-based, verifiable reward signals.

Framework of ScenGenerator.

📊 Results

With EnvScaler, we synthesized 191 environments and about 7K scenarios, and applied them to Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for Qwen3 series models. Results on three benchmarks show that EnvScaler significantly improves LLMs' ability to solve tasks in complex environments involving multiturn, multi-tool interactions.

Statistics of 191 synthesized environments.

Performance comparison.

📁 Project Structure

EnvScaler/
├── skel_builder/              # Stage 1: Env Skeleton Construction
├── scen_generator/            # Stage 2: Scenario Generation
├── interact_with_env/         # Agent-Env Interaction
├── sft/                       # Supervised Fine-Tuning (SFT)
├── rl/                        # Reinforcement Learning (RL)
└── evaluation/                # Evaluation Guide

Module Description

💡 Tip: We provide detailed documentation under each module.

skel_builder/ – Env skeleton construction framework that automatically generates executable environment classes from existing tasks.
scen_generator/ – Scenario generation framework that produces state data, task scenarios, and checkpoint functions for an Env skeleton.
interact_with_env/ – Agent-Env interaction module supporting (1) collecting training data by interacting with synthesized Envs and (2) benchmark evaluation.
sft/ – Supervised fine-tuning implementation based on LlamaFactory.
rl/ – Reinforcement learning implementation based on the ROLL framework.
evaluation/ – Evaluation guide including BFCL, TauBench, and ACEBench.

🚀 Quick Start

1. Clone the repository

git clone https://github.com/RUC-NLPIR/EnvScaler 
cd EnvScaler

2. Install dependencies

pip install -r requirements.txt

💡 Note: Basic dependencies are included in requirements.txt. If you need SFT or RL training, please install extra dependencies following the corresponding sub-project documentation:

SFT training: refer to sft/README.md to install LlamaFactory

RL training: refer to rl/README.md to install the ROLL framework

3. Configure LLM service

Option 1: Use OpenAI API

Create a .env file in the project root and configure your OpenAI API key:

# .env
OPENAI_API_KEY=your-openai-api-key-here
OPENAI_BASE_URL=https://api.openai.com/v1

Option 2: Use self-hosted model

You can deploy a local model with an OpenAI-compatible inference framework such as vLLM.

Deploy a model with vLLM:

vllm serve your-model-path \
    --host 0.0.0.0 \
    --port 8000 \
    --trust-remote-code

⚠️ Important: Ensure the deployed model service supports Function Calling (FC) interface, see vLLM OpenAI-Compatible Server docs for details.

4. Verify configuration

Run the demo to verify your setup:

# Environment interaction demo
cd interact_with_env
python app.py

# Environment interaction Debug
cd interact_with_env
python run_main_debug.py

# Environment building demo
cd skel_builder
python env_build_demo.py

5. Start using

Now you can use each module of EnvScaler independently:

Build environments: refer to skel_builder/README.md
Generate scenarios: refer to scen_generator/README.md
Collect training data: refer to interact_with_env/README.md
Model training: refer to sft/README.md and rl/README.md
Evaluation: refer to evaluation/README.md

📚 Citation

If you find our work helpful, please consider citing it. We greatly appreciate your support.

@misc{song2026envscalerscalingtoolinteractiveenvironments,
      title={EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis}, 
      author={Xiaoshuai Song and Haofei Chang and Guanting Dong and Yutao Zhu and Zhicheng Dou and Ji-Rong Wen},
      year={2026},
      eprint={2601.05808},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.05808}, 
}

📞 Contact

For any questions or feedback, please reach out to us at [email protected].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

If you like our project, please give us a star ⭐ on GitHub. We greatly appreciate your support.

🎬 Demo

Env-Agent-User Interaction

Env-Agent Interaction

Building Environment From Scratch

📦 Dataset & Models

📑 Contents

👀 Overview

📊 Results

📁 Project Structure

Module Description

🚀 Quick Start

1. Clone the repository

2. Install dependencies

3. Configure LLM service

Option 1: Use OpenAI API

Option 2: Use self-hosted model

4. Verify configuration

5. Start using

📚 Citation

📞 Contact

About

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Figs		Figs
evaluation		evaluation
interact_with_env		interact_with_env
rl		rl
scen_generator		scen_generator
sft		sft
skel_builder		skel_builder
.DS_Store		.DS_Store
.env.example		.env.example
LICENSE		LICENSE
README.md		README.md
README_ZH.md		README_ZH.md
requirements.txt		requirements.txt

License

RUC-NLPIR/EnvScaler

Folders and files

Latest commit

History

Repository files navigation

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

If you like our project, please give us a star ⭐ on GitHub. We greatly appreciate your support.

🎬 Demo

Env-Agent-User Interaction

Env-Agent Interaction

Building Environment From Scratch

📦 Dataset & Models

📑 Contents

👀 Overview

📊 Results

📁 Project Structure

Module Description

🚀 Quick Start

1. Clone the repository

2. Install dependencies

3. Configure LLM service

Option 1: Use OpenAI API

Option 2: Use self-hosted model

4. Verify configuration

5. Start using

📚 Citation

📞 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages