conversation.mp4
non_conversation.mp4
env_generation.mp4
To locally run the demo that interacting with Envs:
cd interact_with_env
python app.pyTo locally run the demo that builing Envs from scratch:
cd skel_builder
python env_build_demo.pyWe provide EnvScalerβs data and models (after SFT+RL) as follows:
| Data | Link |
|---|---|
| 191 Env Metadata | π€ HuggingFace |
| 4.7K SFT Scenario | π€ HuggingFace |
| 2.5K RL Scenario | π€ HuggingFace |
| 9K SFT Trajectory | π€ HuggingFace |
| Model | Link |
|---|---|
| EnvScaler-Qwen3-1.7B | π€ HuggingFace |
| EnvScaler-Qwen3-4B | π€ HuggingFace |
| EnvScaler-Qwen3-8B | π€ HuggingFace |
- π¬ Demo
- π¦ Dataset & Models
- π Overview
- π Results
- π Project Structure
- π Quick Start
- π Citation
- π Contact
EnvScaler is an automated, scalable framework that realizes executable, stateful, tool-interactive environments via programmatic synthesis, for training LLM agents.
SkelBuilder is the first stage of EnvScaler. It (1) mines potential Env descriptions from existing open-source textual tasks; (2) plans the corresponding state schema and business rules, and generates a fully-functional Python class whose methods expose tool interfaces; (3) performs a dual-agent loop for Env quality inspection (one agent invokes tools, the other checks code, return values, and state changes), guaranteeing quality and consistency.
ScenGenerator is the second stage for synthesizing multiple Env scenarios. Given an Env skeleton, it first prompts LLMs to generate an initial state/database, then creates a challenging task that can be solved from that state. Finally, it decomposes the task into checklists, and converts each checkpoint into a Python Boolean function over the final state of the Env, providing rule-based, verifiable reward signals.
With EnvScaler, we synthesized 191 environments and about 7K scenarios, and applied them to Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for Qwen3 series models. Results on three benchmarks show that EnvScaler significantly improves LLMs' ability to solve tasks in complex environments involving multiturn, multi-tool interactions.
Statistics of 191 synthesized environments.
EnvScaler/
βββ skel_builder/ # Stage 1: Env Skeleton Construction
βββ scen_generator/ # Stage 2: Scenario Generation
βββ interact_with_env/ # Agent-Env Interaction
βββ sft/ # Supervised Fine-Tuning (SFT)
βββ rl/ # Reinforcement Learning (RL)
βββ evaluation/ # Evaluation Guide
π‘ Tip: We provide detailed documentation under each module.
- skel_builder/ β Env skeleton construction framework that automatically generates executable environment classes from existing tasks.
- scen_generator/ β Scenario generation framework that produces state data, task scenarios, and checkpoint functions for an Env skeleton.
- interact_with_env/ β Agent-Env interaction module supporting (1) collecting training data by interacting with synthesized Envs and (2) benchmark evaluation.
- sft/ β Supervised fine-tuning implementation based on LlamaFactory.
- rl/ β Reinforcement learning implementation based on the ROLL framework.
- evaluation/ β Evaluation guide including BFCL, TauBench, and ACEBench.
git clone https://github.com/RUC-NLPIR/EnvScaler
cd EnvScalerpip install -r requirements.txtπ‘ Note: Basic dependencies are included in
requirements.txt. If you need SFT or RL training, please install extra dependencies following the corresponding sub-project documentation:
- SFT training: refer to sft/README.md to install LlamaFactory
- RL training: refer to rl/README.md to install the ROLL framework
Create a .env file in the project root and configure your OpenAI API key:
# .env
OPENAI_API_KEY=your-openai-api-key-here
OPENAI_BASE_URL=https://api.openai.com/v1 You can deploy a local model with an OpenAI-compatible inference framework such as vLLM.
Deploy a model with vLLM:
vllm serve your-model-path \
--host 0.0.0.0 \
--port 8000 \
--trust-remote-code
β οΈ Important: Ensure the deployed model service supports Function Calling (FC) interface, see vLLM OpenAI-Compatible Server docs for details.
Run the demo to verify your setup:
# Environment interaction demo
cd interact_with_env
python app.py
# Environment interaction Debug
cd interact_with_env
python run_main_debug.py
# Environment building demo
cd skel_builder
python env_build_demo.pyNow you can use each module of EnvScaler independently:
- Build environments: refer to skel_builder/README.md
- Generate scenarios: refer to scen_generator/README.md
- Collect training data: refer to interact_with_env/README.md
- Model training: refer to sft/README.md and rl/README.md
- Evaluation: refer to evaluation/README.md
If you find our work helpful, please consider citing it. We greatly appreciate your support.
@misc{song2026envscalerscalingtoolinteractiveenvironments,
title={EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis},
author={Xiaoshuai Song and Haofei Chang and Guanting Dong and Yutao Zhu and Zhicheng Dou and Ji-Rong Wen},
year={2026},
eprint={2601.05808},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2601.05808},
}For any questions or feedback, please reach out to us at [email protected].




