Skip to content

Commit 06903ae

Browse files
author
yexin
committed
update README
1 parent 3b5085e commit 06903ae

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

verl/checkpoint_engine/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Checkpoint Engine is an unified abstract layer to synchronize weights between va
1919
|hccl|HCCL|all_gather+broadcast|Ascend NPU & HCCL| High|Low: rebuild hccl group|Off-policy training<br>- Trainer/rollout disaggregated<br>- Fixed clusters
2020
|nixl|NIXL|all_gather+ring p2p|Various transport backends (D2D, H2H, H2D, etc)<br>- UCX<br>- UCCL<br>- Mooncacke|Medium/High|High: dynamic adjust ring topology|Off-policy training<br>- Trainer/rollout disaggregated<br>- Elastic rollout<br>- Rollout fault tolerance<br>- Heterogeneous hardware rollout
2121
|kimi_ckpt_engine|MOONCAKE+NCCL/HCCL|p2p+broadcast|NVIDIA/Ascend|High|Low: rebuild communication group|Off-policy training<br>- Trainer/rollout disaggregated<br>- Save checkpoint each time
22+
|mooncake|Mooncake Transfer Engine|all_gather+ring p2p|NVIDIA/Ascend|High|High: dynamic adjust ring topology|Off-policy training<br>- Trainer/rollout disaggregated<br>- Fixed clusters
2223

2324
##### kimi_ckpt_engine detail:
2425

@@ -49,3 +50,4 @@ pytest tests/checkpoint_engine/test_special_server_adapter.py
4950
|4*8 H100, ConnectX-7 400 Gbps (InfiniBand)| NIXL | ~7 | 8.25|
5051
|2*16 Ascend 910C, inner suppernode| HCCL | ~11 | 5.3|
5152
|2*16 Ascend 910C, inner suppernode| kimi_ckpt_engine | offload: 7 update: 3.5 | 16.5|
53+
|2*8 H100, ConnectX-7 400 Gbps (InfiniBand)| mooncake | 5.93 | 9.44|

0 commit comments

Comments
 (0)