Skip to content

Commit c7a1c95

Browse files
Update docs/source/grpo_trainer.md
Co-authored-by: Quentin Gallouédec <[email protected]>
1 parent 3003058 commit c7a1c95

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/source/grpo_trainer.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -268,7 +268,7 @@ trainer = GRPOTrainer(
268268
trainer.train()
269269
```
270270

271-
In this example, the `math_reward_func` and `coding_reward_func` are designed to work with a mixed dataset that contains both math and coding problems. The `task_type` column in the dataset is used to determine which reward function to apply to each problem. If there is no relevant reward function for a sample in the dataset, the reward function will return `None` and the GRPOTrainer will continue with the valid functions and tasks. This allows the GRPOTrainer to handle multiple reward functions with different applicability.
271+
In this example, the `math_reward_func` and `coding_reward_func` are designed to work with a mixed dataset that contains both math and coding problems. The `task_type` column in the dataset is used to determine which reward function to apply to each problem. If there is no relevant reward function for a sample in the dataset, the reward function will return `None` and the [`GRPOTrainer`] will continue with the valid functions and tasks. This allows the GRPOTrainer to handle multiple reward functions with different applicability.
272272

273273
Note that the GRPOTrainer will ignore the `None` rewards returned by the reward functions and only consider the rewards returned by the relevant functions. This ensures that the model is trained on the relevant tasks and ignores the tasks for which there is no relevant reward function.
274274

0 commit comments

Comments
 (0)