Skip to content

Commit 94b9fd4

Browse files
lingzhqhiyuchang
authored andcommitted
Add example for data augmentation in tuner (agentscope-ai#98)
1 parent 654c351 commit 94b9fd4

File tree

17 files changed

+1568
-73
lines changed

17 files changed

+1568
-73
lines changed

tuner/data_augment/README.md

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# Training Math Agent with Data-Augment Strategies
2+
3+
This example demonstrates how to use **AgentScope-Tuner** to enhance a math problem-solving agent. We will focus on leveraging **Data-Centric** features, such as the `difficulty_based` task selector, to improve data utility and training efficiency.
4+
5+
## Task Setting
6+
7+
We use the foundational [math-agent example](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py) as our baseline. The agent is a **`ReActAgent`** that solves mathematical reasoning problems through step-by-step reasoning.
8+
9+
Training can be inefficient if tasks are too easy or too hard. This example demonstrates how to use **task selectors** to dynamically select tasks based on **data feedback**, focusing on "productively challenging" samples to maximize training efficiency. These data-centric techniques are generic and adaptable to other agent workflows.
10+
11+
## Dataset Preparation
12+
13+
To enable difficulty-based sampling, the training data must include difficulty features (e.g., pass rates from LLMs).
14+
15+
1. **Base Dataset**: You can use any standard math problem dataset. A good example is the math data in [LLM360/guru-RL-92k](https://huggingface.co/datasets/LLM360/guru-RL-92k), which comes pre-annotated with pass rates from different LLMs, serving as direct difficulty features.
16+
2. **Build Your Own Features**: If you use your own dataset, you can generate these features by pre-running several models of varying capabilities and recording their pass rates. This can be done within the [**Trinity-RFT**](https://github.com/agentscope-ai/Trinity-RFT/pull/440) framework.
17+
3. **Data Format**: The final dataset should be in HuggingFace format. In this example, data will be transferred to *GSM8K format* according to the [workflow](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py). Besides the task content, it must include the difficulty feature columns you've defined (e.g. `qwen2.5_7b_pass_rate`).
18+
4. **Example Data Preparation**: We provide a script for this example. Simply execute `python prepare_data.py` to generate the required dataset.
19+
20+
## Code Implementation
21+
22+
This example adopts `run_react_agent` and `gsm8k_judge` from the [math-agent example](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py) as `workflow_func` and `judge_func`, demonstrating that training strategies can be applied without altering core agent logic.
23+
24+
### Design of Data-Centric Features
25+
26+
Leveraging the powerful data processing capabilities of **Trinity-RFT**, **AgentScope-Tuner** provides interfaces for advanced operations like task selection and experience processing.
27+
28+
#### Task Selector
29+
30+
The `Task Selector` determines how samples are selected from a dataset. It can be configured directly in configuration YAML files.
31+
32+
- **Built-in Selectors**:
33+
- `sequential`: Samples are selected in a fixed order.
34+
- `shuffle`: The dataset is shuffled at the beginning of each epoch.
35+
- `random`: Samples are randomly chosen with replacement for each batch.
36+
- `offline_easy2hard`: Samples are sorted by a predefined feature for curriculum learning.
37+
- `difficulty_based` (Customized): An adaptive sampler based on task difficulty.
38+
39+
> For more details on `Task Selector`, including how to implement a custom selector based on feedback signals, please refer to **Trinity-RFT**'s **[Selector Development Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)**.
40+
41+
#### Data Processor
42+
43+
The `Data Processor` allows for real-time processing of **Task** and **Experience** during training, enabling operations like calculating feedback metrics, data augmentation, or filtering.
44+
45+
For example, the `difficulty_based` selector requires a `pass_rate_calculator` operator to compute the agent's success rate for each task. This feedback is then used to adjust the sampling strategy.
46+
47+
> For more details on `Data Processor`, please refer to **Trinity-RFT**'s **[Operator Development Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)**.
48+
49+
50+
### Configuring the Experiments
51+
52+
To maintain clarity and simplicity, we recommend defining all data-specific parameters, including dataset paths and task selectors, within YAML configuration files.
53+
54+
We provide two configuration files to compare the baseline `random` selector against the `difficulty_based` selector.
55+
56+
**Experiment 1: Baseline with Random Selector (`config_random.yaml`)**
57+
58+
In `config_random.yaml`, we configure the `task_selector` for random sampling under `buffer.explorer_input.taskset`.
59+
60+
```yaml
61+
# In config_random.yaml
62+
buffer:
63+
# ...
64+
explorer_input:
65+
taskset: # Training data
66+
path: "path/to/your/augmented/math_data"
67+
split: "train"
68+
task_selector:
69+
selector_type: random # Strategy of task selection
70+
```
71+
72+
**Experiment 2: Advanced Training with Difficulty-Based Selector (`config_difficulty.yaml`)**
73+
74+
In `config_difficulty.yaml`, we switch the `task_selector` to difficulty_based and provide its specific parameters. Note that this config also enables the `pass_rate_calculator` needed for feedback.
75+
76+
```yaml
77+
# In config_difficulty.yaml
78+
79+
# Enable the calculator to provide feedback for the selector
80+
data_processor:
81+
experience_pipeline:
82+
operators:
83+
- name: pass_rate_calculator
84+
85+
buffer:
86+
# ...
87+
explorer_input:
88+
taskset: # Training data
89+
path: "path/to/your/augmented/math_data"
90+
split: "train"
91+
task_selector:
92+
selector_type: difficulty_based # Strategy of task selection
93+
feature_keys: [ "qwen2.5_7b_pass_rate", "qwen3_30b_pass_rate" ]
94+
kwargs: # Hyper-parameters for the selection algorithm
95+
m: 8
96+
# ...
97+
```
98+
99+
> The `difficulty_based` selector in this example is an implementation of the ***BOTS*** algorithm. For details on its inner workings, please refer to the [***BOTS paper***](https://arxiv.org/abs/2510.26374) and its [***tutorials***](https://github.com/agentscope-ai/Trinity-RFT/blob/main/examples/bots/README.md).
100+
101+
## How to Run
102+
103+
### Step 1: Prerequisites
104+
105+
Ensure you have installed **AgentScope** and **Trinity-RFT** with [the guidance](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/README.md#how-to-run).
106+
107+
### Step 2: Prepare the Dataset
108+
109+
Run the data preparation script. Make sure to update the dataset paths in `config_random.yaml` and `config_difficulty.yaml` afterward.
110+
111+
```bash
112+
python prepare_data.py
113+
```
114+
115+
### Step 3: Start Ray Cluster
116+
117+
For distributed training, start a Ray cluster.
118+
119+
```bash
120+
# For single node
121+
ray start --head
122+
```
123+
124+
### Step 4: Run Training
125+
126+
You can now run either the baseline or the difficulty-based training experiment.
127+
128+
- **To run the baseline experiment with a random selector:**
129+
130+
```bash
131+
python main.py --config config_random.yaml
132+
```
133+
134+
- **To run the experiment with the difficulty-based selector:**
135+
```bash
136+
python main.py --config config_difficulty.yaml
137+
```
138+
139+
## Experimental Results
140+
141+
The following results compare the performance of the `difficulty-based` selection strategy (red line, bots) against a standard `random` selection strategy (black line, random).
142+
143+
<div align="center">
144+
<img src="./training_result.jpg" alt="Training Result Image" width="90%"/>
145+
</div>
146+
147+
### Training Reward Curve
148+
149+
The chart on the left shows the rollout accuracy during training. As can be seen, the tasks sampled by the random strategy appear to be difficult for the model, with the accuracy remaining below 0.2. In contrast, using the difficulty selector results in a higher mean accuracy, indicating that the agent is engaging with more tasks that it can successfully solve.
150+
151+
### Evaluation on AIME-24
152+
153+
For comparison, we evaluated both selection strategies on the AIME-24 benchmark. The chart on the right shows that the difficulty-based method demonstrates a better upward trend in performance over time.

tuner/data_augment/README_zh.md

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# 使用数据增强策略训练数学智能体
2+
3+
本示例演示了如何使用 **AgentScope-Tuner** 训练数学问题求解智能体。我们将重点利用**以数据为中心**的功能,例如 `difficulty_based` 任务选择器,以提高数据利用率和训练效率。
4+
5+
## 任务设置
6+
7+
我们使用基础的 [math-agent 示例](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py) 作为基线。智能体是 **`ReActAgent`**,通过逐步推理解决数学推理问题。
8+
9+
如果任务太容易或太难,训练可能会效率低下。本示例演示如何使用**任务选择器**基于**数据反馈**动态选择任务,专注于"具有挑战性"的样本以最大化训练效率。这些以数据为中心的技术是通用的,可适应其他智能体工作流。
10+
11+
## 数据集准备
12+
13+
为启用基于难度的采样,训练数据必须包含难度特征(如 LLM 的通过率)。
14+
15+
1. **基础数据集**:您可以使用任何标准的数学问题数据集。一个很好的例子是 [LLM360/guru-RL-92k](https://huggingface.co/datasets/LLM360/guru-RL-92k) 中的数学数据,它预先标注了来自不同 LLM 的通过率,作为直接的难度特征。
16+
2. **构建您自己的特征**:如果您使用自己的数据集,可以通过预先运行几个不同能力的模型并记录它们的通过率来生成这些特征。这可以在 [**Trinity-RFT**](https://github.com/agentscope-ai/Trinity-RFT/pull/440) 框架内完成。
17+
3. **数据格式**:最终数据集应为 HuggingFace 格式。在此示例中,数据将根据[工作流](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py)转换为 *GSM8K 格式*。除了任务内容外,它还必须包含您定义的难度特征列(例如 `qwen2.5_7b_pass_rate`)。
18+
4. **示例数据准备**:我们为此示例提供了一个脚本。只需执行 `python prepare_data.py` 即可生成所需的数据集。
19+
20+
## 代码实现
21+
22+
本示例采用 [math-agent 示例](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py)`run_react_agent``gsm8k_judge` 作为 `workflow_func``judge_func`,说明可以在不改变核心智能体逻辑的情况下应用训练策略。
23+
24+
### 以数据为中心功能的设计
25+
26+
利用 **Trinity-RFT** 强大的数据处理能力,**AgentScope-Tuner** 为任务选择和经验处理等高级操作提供了接口。
27+
28+
#### 任务选择器
29+
30+
`Task Selector` 决定如何从数据集中选择样本。它可以直接在 YAML 配置文件中配置。
31+
32+
- **内置选择器**
33+
- `sequential`:按固定顺序选择样本。
34+
- `shuffle`:在每个 epoch 开始时打乱数据集。
35+
- `random`:为每个批次随机选择样本(有放回)。
36+
- `offline_easy2hard`:按预定义特征对样本进行排序,用于课程学习。
37+
- `difficulty_based`(自定义):基于任务难度的自适应采样器。
38+
39+
> 有关 `Task Selector` 的更多详细信息,包括如何基于反馈信号实现自定义选择器,请参阅 **Trinity-RFT****[Selector 开发指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_selector.html)**
40+
41+
#### 数据处理器
42+
43+
`Data Processor` 允许在训练期间实时处理**任务**(task)和**经验**(experience),支持计算反馈指标、数据增强或过滤等操作。
44+
45+
例如,`difficulty_based` 选择器需要一个 `pass_rate_calculator` 操作符来计算智能体对每个任务的成功率。然后使用此反馈来调整采样策略。
46+
47+
> 有关 `Data Processor` 的更多详细信息,请参阅 **Trinity-RFT****[Operator 开发指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_operator.html)**
48+
49+
50+
### 配置实验
51+
52+
为了保持清晰和简洁,我们建议在 YAML 配置文件中定义所有数据特定参数,包括数据集路径和任务选择器。
53+
54+
我们提供两个配置文件,用于比较基线 `random` 选择器与 `difficulty_based` 选择器。
55+
56+
**实验 1:使用随机选择器的基线(`config_random.yaml`**
57+
58+
`config_random.yaml` 中,我们在 `buffer.explorer_input.taskset` 下配置用于随机采样的 `task_selector`
59+
60+
```yaml
61+
# 在 config_random.yaml 中
62+
buffer:
63+
# ...
64+
explorer_input:
65+
taskset: # 训练数据
66+
path: "path/to/your/augmented/math_data"
67+
split: "train"
68+
task_selector:
69+
selector_type: random # 任务选择策略
70+
```
71+
72+
**实验 2:使用基于难度选择器的高级训练(`config_difficulty.yaml`)**
73+
74+
在 `config_difficulty.yaml` 中,我们将 `task_selector` 切换为 `difficulty_based` 并提供其特定参数。请注意,此配置还启用了反馈所需的 `pass_rate_calculator`。
75+
76+
```yaml
77+
# 在 config_difficulty.yaml 中
78+
79+
# 启用计算器为选择器提供反馈
80+
data_processor:
81+
experience_pipeline:
82+
operators:
83+
- name: pass_rate_calculator
84+
85+
buffer:
86+
# ...
87+
explorer_input:
88+
taskset: # 训练数据
89+
path: "path/to/your/augmented/math_data"
90+
split: "train"
91+
task_selector:
92+
selector_type: difficulty_based # 任务选择策略
93+
feature_keys: [ "qwen2.5_7b_pass_rate", "qwen3_30b_pass_rate" ]
94+
kwargs: # 选择算法的超参数
95+
m: 8
96+
# ...
97+
```
98+
99+
> 本示例中的 `difficulty_based` 选择器是 ***BOTS*** 算法的实现。有关其内部工作原理的详细信息,请参阅 [***BOTS 论文***](https://arxiv.org/abs/2510.26374) 及其 [***教程***](https://github.com/agentscope-ai/Trinity-RFT/blob/main/examples/bots/README.md)。
100+
101+
## 如何运行
102+
103+
### 步骤 1:前置要求
104+
105+
确保您已按照[指南](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/README_zh.md#how-to-run)安装了 **AgentScope** 和 **Trinity-RFT**。
106+
107+
### 步骤 2:准备数据集
108+
109+
运行数据准备脚本。确保之后更新 `config_random.yaml` 和 `config_difficulty.yaml` 中的数据集路径。
110+
111+
```bash
112+
python prepare_data.py
113+
```
114+
115+
### 步骤 3:启动 Ray 集群
116+
117+
对于分布式训练,启动 Ray 集群。
118+
119+
```bash
120+
# 单节点
121+
ray start --head
122+
```
123+
124+
### 步骤 4:运行训练
125+
126+
您现在可以运行基线或基于难度的训练实验。
127+
128+
- **使用随机选择器运行基线实验:**
129+
130+
```bash
131+
python main.py --config config_random.yaml
132+
```
133+
134+
- **使用基于难度的选择器运行实验:**
135+
```bash
136+
python main.py --config config_difficulty.yaml
137+
```
138+
139+
## 实验结果
140+
141+
以下结果比较了 `difficulty-based` 选择策略(红线,bots)与标准 `random` 选择策略(黑线,random)的性能。
142+
143+
<div align="center">
144+
<img src="./training_result.jpg" alt="训练结果图" width="90%"/>
145+
</div>
146+
147+
### 训练奖励曲线
148+
149+
左侧图表显示了训练期间的 rollout 准确率。可以看出,随机策略采样的任务对模型来说似乎很困难,准确率保持在 0.2 以下。相比之下,使用难度选择器会产生更高的平均准确率,表明智能体正在处理更多可以成功解决的任务。
150+
151+
### 在 AIME-24 上的评估
152+
153+
为了比较,我们在 AIME-24 基准上评估了两种选择策略。右侧图表显示,基于难度的方法在性能上表现出更好的上升趋势。
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
project: "Data-Augmentation" # Project name
2+
name: "Difficulty-Based-Selector" # Experiment name
3+
checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints} # Directory to save model checkpoints
4+
5+
data_processor:
6+
experience_pipeline:
7+
operators:
8+
- name: pass_rate_calculator # Calculate average reward and pass it back to selector
9+
10+
buffer:
11+
total_epochs: 1 # Total training epochs
12+
explorer_input:
13+
taskset:
14+
path: "path/to/your/augmented/math_data" # Training data path
15+
split: "train" # Training data split
16+
task_selector:
17+
selector_type: difficulty_based # Strategy of task selection
18+
feature_keys: [ "qwen2.5_7b_pass_rate", "qwen3_30b_pass_rate" ] # Utilized pass_rate key
19+
kwargs: # Hyperparameter from [BOTS](https://github.com/modelscope/Trinity-RFT/blob/main/examples/bots/README.md)
20+
m: 8
21+
lamb: 0.1
22+
rho: 0.1
23+
target_reward: 0.8
24+
tau: 0
25+
do_sample: true
26+
eval_tasksets:
27+
- name: "eval-aime24" # Evaluation data name
28+
path: "path/to/aime24_data" # Evaluation data path
29+
split: "test" # Evaluation data split
30+
31+
synchronizer:
32+
sync_style: dynamic_by_explorer # Sync triggered dynamically by explorer
33+
sync_method: 'nccl'
34+
sync_interval: 4 # Sync every N steps
35+
sync_timeout: 7200 # Timeout for synchronization (seconds)
36+
37+
monitor:
38+
monitor_type: tensorboard # Can also use wandb, mlflow or swanlab
39+
40+
# The config below has been set in python file
41+
42+
algorithm:
43+
algorithm_type: multi_step_grpo # GRPO series for multi-step scenario
44+
repeat_times: 8 # Number of rollouts per prompt for advantage estimation
45+
optimizer:
46+
lr: 1e-6 # Learning rate
47+
48+
model:
49+
model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen3-0.6B} # Base model path
50+
max_model_len: 24576 # Max context length
51+
max_response_tokens: 16384 # Max tokens per response
52+
temperature: 1.0 # Temperature of model's generation
53+
54+
cluster:
55+
node_num: 1 # Number of used nodes
56+
gpu_per_node: 8 # Number of GPUs every node
57+
58+
explorer:
59+
eval_interval: 20 # Evaluation every N steps
60+
runner_per_model: 16 # Runners per infer engine
61+
max_timeout: 1200 # Max timeout for each rollout (seconds)
62+
rollout_model:
63+
engine_num: 4 # Number of vLLM engines for rollout model
64+
tensor_parallel_size: 1 # TP size per engine for rollout model
65+
enable_openai_api: true # Enable OpenAI-compatible API
66+
enable_history: true # Enable conversation history
67+
enable_auto_tool_choice: true # Enable automatic tool selection
68+
tool_call_parser: hermes # Parser for tool calls
69+
reasoning_parser: deepseek_r1 # Parser for reasoning type
70+
71+
trainer:
72+
save_interval: 100 # Save checkpoint every N steps
73+
use_dynamic_bsz: true # Use dynamic batch size
74+
ulysses_sequence_parallel_size: 1 # Sequence parallel size for Ulysses

0 commit comments

Comments
 (0)