AgentEvolver/research/CuES/README.md at main · modelscope/AgentEvolver

CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL

CuES is a curiosity-driven, environment-grounded framework for synthesizing high-quality agentic data without predefined seed queries.

🪄 Overview

Conceptually, CuES unfolds in five coordinated stages (as in the paper):

Requirement Confirmation – derive guiding principles and a concept pool from environment description, optional user requirement, and (optional) seed queries.
Curious Exploration – explore the environment bottom-up using an Explorer Agent guided by a memory tree and concept pool.
Task Abstraction – lift consecutive low-level interaction triplets into executable natural-language tasks with guidelines.
Quality Control – re-execute and judge each candidate task to ensure executability and path faithfulness.
Query Rewrite – progressively expose guideline hints in the query text to control difficulty and diversify task formulations.

This repository implements a lightweight three-stage pipeline that closely follows the above design:

Stage 1: Triplet Generation (Curious Exploration)
Curiosity-guided rollout in the environment, storing triplets (state, action, observation) and environment memory.
Stage 2: Task Abstraction
Derive natural-language tasks (queries) and guidelines from exploration trajectories.
Stage 3: Trajectory Generation (Quality-Controlled Execution)
Execute synthesized tasks with an agent, record complete trajectories, and attach success/failure metadata.
Optional: Query Rewrite，Requirement Confirm

While performance on benchmarks(AppWorld, WebShop, and BFCL v3) continues to improve with larger LLMs, Qwen2.5 14B under the proposed CuES achieves a substantially higher accuracy across all benchmarks.

CuES is evaluated on AppWorld, WebShop, and BFCL v3 Multi-Turn Base, and the synthesized data is shown in the paper to match or surpass original datasets in terms of diversity, executability, and downstream RL performance.

✨ Features

Environment-grounded synthesis
Tasks originate from executed trajectories, ensuring feasibility by construction.
Curiosity + top-down guidance
Concept pool and principles guide exploration toward salient regions without prescribing exact tasks.
Environment memory tree
Encourages novel actions and reduces redundant loops during exploration.
Query-free or limited-query setting
Operates without manual seed tasks; optional seeds only refine the concept pool.
Multi-environment support
Designed for AppWorld, WebShop, and BFCL v3 Multi-Turn Base via EnvService.
Custom JSON schemas per stage
Stable data interfaces across Stage 1/2/3 and Query Rewrite.
Config-driven pipeline
All behavior controlled through config/config.yaml.

📥 Quickstart

Requirements

Python 3.10+
Access to Aliyun DashScope API
(Optional) EnvService if using external interactive environments (AppWorld / BFCL / WebShop)

Installation

# Create and activate a virtual environment
# Install dependencies (adapt to your environment)
# Refer to the agentevolver repository for environment installation.
# bash ./env_service/environments/appworld/setup.sh
# Export your DashScope key:
# export DASHSCOPE_API_KEY=... (or set in config.yaml)
# conda activate appworld
# bash ./env_service/launch_script/appworld.sh
# python ./env_service/test_script/test_appworld.py

Configuration See config/config.yaml for full options. Key sections:

api: DashScope model, temperature, max tokens
environment: type and EnvService endpoint
stage1/stage2/stage3: knobs per stage
threading: worker pool settings
logging: level and file path
rewrite: Query Rewrite settings

# All stages:
python main.py --stage all --config config/config.yaml
# Stage 1:
python main.py --stage stage1
# Stage 2:
python main.py --stage stage2 --input-file ./data/triplets/*.jsonl (latest auto-selected if omitted)
# Stage 3:
python main.py --stage stage3 --input-file ./data/tasks/*.jsonl (latest auto-selected if omitted)
# Enable rewrite after stage 3: add --rewrite (alias: --query-rewrite)
# Enable requirement confirm before stage 1: add --extract

Outputs are written to ./data/:

data/triplets/*.jsonl
data/tasks/*.jsonl
data/trajectories/trajectory_*.json
data/trajectories/failed_tasks/*.json
data/rewrites/trajectories/trajectory__rw.json

💻 Development information

CLI

--config: path to config file (default: config/config.yaml)
--stage: all | stage1 | stage2 | stage3
--session-name: label for the run
--input-file: input for stage2/stage3
--verbose: verbose logging
--requirement: exploration requirement for stage1
--extract: extract concept set from env (optional)
--rewrite | --query-rewrite: run Query Rewrite after Stage 3

main() flow

Load config and validate API key
If AppWorld is selected, probe EnvService
Build pipeline and run selected stage(s)
Save outputs and stats
If rewrite is enabled, run Query Rewrite over Stage 3 outputs

notes

Code layout:
- agents/: LLM action planner and evaluator
- core/: API client, memory manager, pipeline
- data/: pydantic-like models and storage helpers
- envs/: EnvService managers (AppWorld, BFCL, WebShop)
- prompts/: all prompt builders and parsers
- stages/: Stage 1/2/3 implementations
- utils/: logger and config utilities
Trajectories store both metadata and messages for downstream use.
Query Rewrite preserves all keys and only changes query.

📚 Citation

If you find this work useful, please consider citing:

@misc{mai2025cues,
  title         = {CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL},
  author        = {Mai, Shinji and Zhai, Yunpeng and Chen, Ziqian and Chen, Cheng and Zou, Anni and Tao, Shuchang and Liu, Zhaoyang and Ding, Bolin},
  year          = {2025},
  month         = dec,
  eprint        = {2512.01311},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI},
  doi           = {10.48550/arXiv.2512.01311},
  url           = {https://arxiv.org/abs/2512.01311}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL

🪄 Overview

✨ Features

📥 Quickstart

💻 Development information

📚 Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL

🪄 Overview

✨ Features

📥 Quickstart

💻 Development information

📚 Citation