Skip to content

Commit 1f9e821

Browse files
committed
doc
1 parent 49276c0 commit 1f9e821

File tree

2 files changed

+4
-0
lines changed

2 files changed

+4
-0
lines changed

docs/source/Instruction/GRPO/DeveloperGuide/reward_function.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,8 @@ swift 支持同时使用同步和异步奖励函数。训练器会自动检测
8181
- 同步奖励函数按顺序执行
8282
- 异步奖励函数使用 `asyncio.gather` 并行执行
8383

84+
[plugin](https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/plugin/plugin.py)文件中提供了一个调用`swift deploy`服务的生成式奖励模型的例子(async_genrm)
85+
8486
## 内置奖励函数
8587
swift内置了五种基于规则的奖励函数(代码见swift/plugin/orm.py)
8688

docs/source_en/Instruction/GRPO/DeveloperGuide/reward_function.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,8 @@ Swift supports using both synchronous and asynchronous reward functions simultan
8080
- Synchronous reward functions are executed sequentially
8181
- Asynchronous reward functions are executed in parallel using `asyncio.gather`
8282

83+
The [plugin](https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/plugin/plugin.py) file provides an example of a generative reward model (async_genrm) that calls the `swift deploy` service.
84+
8385
## Built-in Reward Functions
8486
Swift includes five rule-based reward functions (code can be found in swift/plugin/orm.py).
8587

0 commit comments

Comments
 (0)