|
| 1 | +# Training Math Agent with Data-Augment Strategies |
| 2 | + |
| 3 | +This example demonstrates how to use **AgentScope-Tuner** to enhance a math problem-solving agent. We will focus on leveraging **Data-Centric** features, such as the `difficulty_based` task selector, to improve data utility and training efficiency. |
| 4 | + |
| 5 | +## Task Setting |
| 6 | + |
| 7 | +We use the foundational [math-agent example](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py) as our baseline. The agent is a **`ReActAgent`** that solves mathematical reasoning problems through step-by-step reasoning. |
| 8 | + |
| 9 | +Training can be inefficient if tasks are too easy or too hard. This example demonstrates how to use **task selectors** to dynamically select tasks based on **data feedback**, focusing on "productively challenging" samples to maximize training efficiency. These data-centric techniques are generic and adaptable to other agent workflows. |
| 10 | + |
| 11 | +## Dataset Preparation |
| 12 | + |
| 13 | +To enable difficulty-based sampling, the training data must include difficulty features (e.g., pass rates from LLMs). |
| 14 | + |
| 15 | +1. **Base Dataset**: You can use any standard math problem dataset. A good example is the math data in [LLM360/guru-RL-92k](https://huggingface.co/datasets/LLM360/guru-RL-92k), which comes pre-annotated with pass rates from different LLMs, serving as direct difficulty features. |
| 16 | +2. **Build Your Own Features**: If you use your own dataset, you can generate these features by pre-running several models of varying capabilities and recording their pass rates. This can be done within the [**Trinity-RFT**](https://github.com/agentscope-ai/Trinity-RFT/pull/440) framework. |
| 17 | +3. **Data Format**: The final dataset should be in HuggingFace format. In this example, data will be transferred to *GSM8K format* according to the [workflow](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py). Besides the task content, it must include the difficulty feature columns you've defined (e.g. `qwen2.5_7b_pass_rate`). |
| 18 | +4. **Example Data Preparation**: We provide a script for this example. Simply execute `python prepare_data.py` to generate the required dataset. |
| 19 | + |
| 20 | +## Code Implementation |
| 21 | + |
| 22 | +This example adopts `run_react_agent` and `gsm8k_judge` from the [math-agent example](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py) as `workflow_func` and `judge_func`, demonstrating that training strategies can be applied without altering core agent logic. |
| 23 | + |
| 24 | +### Design of Data-Centric Features |
| 25 | + |
| 26 | +Leveraging the powerful data processing capabilities of **Trinity-RFT**, **AgentScope-Tuner** provides interfaces for advanced operations like task selection and experience processing. |
| 27 | + |
| 28 | +#### Task Selector |
| 29 | + |
| 30 | +The `Task Selector` determines how samples are selected from a dataset. It can be configured directly in configuration YAML files. |
| 31 | + |
| 32 | +- **Built-in Selectors**: |
| 33 | + - `sequential`: Samples are selected in a fixed order. |
| 34 | + - `shuffle`: The dataset is shuffled at the beginning of each epoch. |
| 35 | + - `random`: Samples are randomly chosen with replacement for each batch. |
| 36 | + - `offline_easy2hard`: Samples are sorted by a predefined feature for curriculum learning. |
| 37 | + - `difficulty_based` (Customized): An adaptive sampler based on task difficulty. |
| 38 | + |
| 39 | +> For more details on `Task Selector`, including how to implement a custom selector based on feedback signals, please refer to **Trinity-RFT**'s **[Selector Development Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)**. |
| 40 | +
|
| 41 | +#### Data Processor |
| 42 | + |
| 43 | +The `Data Processor` allows for real-time processing of **Task** and **Experience** during training, enabling operations like calculating feedback metrics, data augmentation, or filtering. |
| 44 | + |
| 45 | +For example, the `difficulty_based` selector requires a `pass_rate_calculator` operator to compute the agent's success rate for each task. This feedback is then used to adjust the sampling strategy. |
| 46 | + |
| 47 | +> For more details on `Data Processor`, please refer to **Trinity-RFT**'s **[Operator Development Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)**. |
| 48 | +
|
| 49 | + |
| 50 | +### Configuring the Experiments |
| 51 | + |
| 52 | +To maintain clarity and simplicity, we recommend defining all data-specific parameters, including dataset paths and task selectors, within YAML configuration files. |
| 53 | + |
| 54 | +We provide two configuration files to compare the baseline `random` selector against the `difficulty_based` selector. |
| 55 | + |
| 56 | +**Experiment 1: Baseline with Random Selector (`config_random.yaml`)** |
| 57 | + |
| 58 | +In `config_random.yaml`, we configure the `task_selector` for random sampling under `buffer.explorer_input.taskset`. |
| 59 | + |
| 60 | +```yaml |
| 61 | +# In config_random.yaml |
| 62 | +buffer: |
| 63 | + # ... |
| 64 | + explorer_input: |
| 65 | + taskset: # Training data |
| 66 | + path: "path/to/your/augmented/math_data" |
| 67 | + split: "train" |
| 68 | + task_selector: |
| 69 | + selector_type: random # Strategy of task selection |
| 70 | +``` |
| 71 | +
|
| 72 | +**Experiment 2: Advanced Training with Difficulty-Based Selector (`config_difficulty.yaml`)** |
| 73 | + |
| 74 | +In `config_difficulty.yaml`, we switch the `task_selector` to difficulty_based and provide its specific parameters. Note that this config also enables the `pass_rate_calculator` needed for feedback. |
| 75 | + |
| 76 | +```yaml |
| 77 | +# In config_difficulty.yaml |
| 78 | +
|
| 79 | +# Enable the calculator to provide feedback for the selector |
| 80 | +data_processor: |
| 81 | + experience_pipeline: |
| 82 | + operators: |
| 83 | + - name: pass_rate_calculator |
| 84 | +
|
| 85 | +buffer: |
| 86 | + # ... |
| 87 | + explorer_input: |
| 88 | + taskset: # Training data |
| 89 | + path: "path/to/your/augmented/math_data" |
| 90 | + split: "train" |
| 91 | + task_selector: |
| 92 | + selector_type: difficulty_based # Strategy of task selection |
| 93 | + feature_keys: [ "qwen2.5_7b_pass_rate", "qwen3_30b_pass_rate" ] |
| 94 | + kwargs: # Hyper-parameters for the selection algorithm |
| 95 | + m: 8 |
| 96 | + # ... |
| 97 | +``` |
| 98 | + |
| 99 | +> The `difficulty_based` selector in this example is an implementation of the ***BOTS*** algorithm. For details on its inner workings, please refer to the [***BOTS paper***](https://arxiv.org/abs/2510.26374) and its [***tutorials***](https://github.com/agentscope-ai/Trinity-RFT/blob/main/examples/bots/README.md). |
| 100 | + |
| 101 | +## How to Run |
| 102 | + |
| 103 | +### Step 1: Prerequisites |
| 104 | + |
| 105 | +Ensure you have installed **AgentScope** and **Trinity-RFT** with [the guidance](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/README.md#how-to-run). |
| 106 | + |
| 107 | +### Step 2: Prepare the Dataset |
| 108 | + |
| 109 | +Run the data preparation script. Make sure to update the dataset paths in `config_random.yaml` and `config_difficulty.yaml` afterward. |
| 110 | + |
| 111 | +```bash |
| 112 | +python prepare_data.py |
| 113 | +``` |
| 114 | + |
| 115 | +### Step 3: Start Ray Cluster |
| 116 | + |
| 117 | +For distributed training, start a Ray cluster. |
| 118 | + |
| 119 | +```bash |
| 120 | +# For single node |
| 121 | +ray start --head |
| 122 | +``` |
| 123 | + |
| 124 | +### Step 4: Run Training |
| 125 | + |
| 126 | +You can now run either the baseline or the difficulty-based training experiment. |
| 127 | + |
| 128 | +- **To run the baseline experiment with a random selector:** |
| 129 | + |
| 130 | +```bash |
| 131 | +python main.py --config config_random.yaml |
| 132 | +``` |
| 133 | + |
| 134 | +- **To run the experiment with the difficulty-based selector:** |
| 135 | +```bash |
| 136 | +python main.py --config config_difficulty.yaml |
| 137 | +``` |
| 138 | + |
| 139 | +## Experimental Results |
| 140 | + |
| 141 | +The following results compare the performance of the `difficulty-based` selection strategy (red line, bots) against a standard `random` selection strategy (black line, random). |
| 142 | + |
| 143 | +<div align="center"> |
| 144 | + <img src="./training_result.jpg" alt="Training Result Image" width="90%"/> |
| 145 | +</div> |
| 146 | + |
| 147 | +### Training Reward Curve |
| 148 | + |
| 149 | +The chart on the left shows the rollout accuracy during training. As can be seen, the tasks sampled by the random strategy appear to be difficult for the model, with the accuracy remaining below 0.2. In contrast, using the difficulty selector results in a higher mean accuracy, indicating that the agent is engaging with more tasks that it can successfully solve. |
| 150 | + |
| 151 | +### Evaluation on AIME-24 |
| 152 | + |
| 153 | +For comparison, we evaluated both selection strategies on the AIME-24 benchmark. The chart on the right shows that the difficulty-based method demonstrates a better upward trend in performance over time. |
0 commit comments