Skip to content

Commit 6451b49

Browse files
committed
delete async
1 parent e63ddf4 commit 6451b49

File tree

10 files changed

+0
-1331
lines changed

10 files changed

+0
-1331
lines changed

docs/en/models/qwen3-4B.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -303,9 +303,3 @@ In this case, 2 GPUs will be allocated for training, and 6 GPUs will be allocate
303303
```bash
304304
--sglang-cuda-graph-bs 1 2 4 8 $(seq 16 8 256)
305305
```
306-
307-
### Asynchronous Training
308-
309-
When you separate training and inference, you may notice that the training and inference GPUs are always waiting for each other. To prevent these resources from being idle, we can enable asynchronous training. This can be done by changing `train.py` to `train_async.py` in the startup script. By doing this, slime will generate data for the next rollout while training on the current one.
310-
311-
The only difference between `train.py` and `train_async.py` lies in the synchronization logic of the training loop. We achieve this by using Ray's asynchronous features (`.remote`, `ray.get`).

docs/zh/models/qwen3-4B.md

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -303,11 +303,3 @@ ray job submit ... \
303303
```bash
304304
--sglang-cuda-graph-bs 1 2 4 8 $(seq 16 8 256)
305305
```
306-
307-
### 异步训练
308-
309-
当进行训推分离时,你会发现训练和推理的 GPU 总是相互等待着,为了避免这种资源空闲,我们可以开启异步训练。开启的方式即为将启动脚本中的 `train.py` 改变为 `train_async.py`。这样 slime 就会在进行当前 rollout 的训练时进行下一个 rollout 的数据生成了。
310-
311-
`train.py``train_async.py` 的差别只在于 train loop 的同步逻辑,我们通过 ray 的异步(`.remote`, `ray.get`)实现了这点。
312-
313-
⚠️ 在异步训练时,sglang 的性能检测日志与训练日志可能会混到一起,不易区分,可以通过 `--sglang-log-level` 来减少 sglang 的日志。

slime_plugins/rollout_buffer/README.md

Lines changed: 0 additions & 50 deletions
This file was deleted.

slime_plugins/rollout_buffer/README_zh.md

Lines changed: 0 additions & 51 deletions
This file was deleted.

0 commit comments

Comments
 (0)