Release v0.2.2 · THUDM/slime

v0.2.2 is here! Thanks to everyone who contributed to this release.

Major Updates

In addition to multiple memory and performance improvements, v0.2.2 adds support for:

Int4-QAT training
Full R3 (Rollout Routing Replay) support with DeepEP and MTP
Dependency upgrades: SGLang v0.5.7 and the Megatron dev branch

What's Changed

add ckpt load save ci by @lilei199908 in #1104
Add --rollout-all-samples-process-path for RLVE by @zhuzilin in #1107
feat: support Qwen3 Moe BackEnd Kernel by @attack204 in #1071
fix max response/context/prompt len by @lilei199908 in #1110
fix max len by @lilei199908 in #1112
[docker] remove amem and support deepep + r3 by @zhuzilin in #1115
[Fix] Fix early return in init rollout engine by @yitianlian in #1118
[Fix] Add sglang patch for weight version update by @yitianlian in #1119
fix: improve tokenization by @nanjiangwill in #1113
[Feature] Add CI test for weight version update by @yitianlian in #1120
[docker] optimize r3 with base64 encode by @zhuzilin in #1124
[docker] fix r3 gather buffer by @zhuzilin in #1129
[docker] support mtp for r3 by @zhuzilin in #1131
[Fix] Fix some bugs in retool example by @yitianlian in #1130
Add finalize_model_grads_with_empty_cache by @zhuzilin in #1133
Feat: add usage docs for fsdp by @lin0303-siyuan in #1092
Reserve more ports for new sglang dp attn impl by @zhuzilin in #1142
Blog: fix the path of the Blog's architecture image by @ShanningZhuang in #1125
Support async save and add extra save at the end of the training by @zhuzilin in #1143
fix: fix GemmeRMSNorm.forward() bug by @nanjiangwill in #1121
[WIP][FSDP] Support FSDP for Qwen3Next by @rucnyz in #1116
Megatron VLM Support (1/N) by @Zhuohao-Li in #1123
Update deprecated huggingface-cli and fix broken links by @Lyken17 in #1147
Added FSDP checkpoint handling to convert_torch_dist_to_hf.py by @cklxx in #1101
minor fix for megatron compatibility by @zhuzilin in #1149
Remove config_mapping to use megatron-bridge by @zhuzilin in #1166
Avoids repeated work. by @qqwqqw689 in #1163
Make tools/convert_torch_dist_to_hf.py not rely on megatron by @zhuzilin in #1167
support converting dpsk mtp layer by @zhuzilin in #1169
[FSDP] Add Masked importance sampling by @zijiexia in #1122
[TIS/MIS] fix and add better metric by @ChangyiYang in #1174
Fix optimizer schedule resume by @lr-tsinghua11 in #1152
[docker] upgrade to megatron dev branch by @zhuzilin in #1153
Minor fix by @lancerts in #1165
Fix forward of Qwen3VLTextRotaryEmbedding in Megatron-Bridge by @zhuzilin in #1179
Reuse the text llm config for qwen3 vl models by @zhuzilin in #1180
Don't save AutoBridge in args by @zhuzilin in #1181
[Fix] Fix port error in PD disaggregation setting by @yitianlian in #1175
Fix prompt type bug in generate_with_search within examples/search-r1 by @jiahe7ay in #1182
feat: support Qwen3 VL MoE by @nanjiangwill in #1171
[Fix] Minor fix by @yitianlian in #1183
Set parallel config for megatron bridge by @zhuzilin in #1184
Fix tools/convert_hf_to_torch_dist.py by @zhuzilin in #1186
Don't calculate entropy grad when coef is 0 by @zhuzilin in #1185
Disable routing replay for critic by @zhuzilin in #1187
Revert "Don't calculate entropy grad when coef is 0" by @zhuzilin in #1189
Fix qwen3next for megatron dev branch by @zhuzilin in #1190
fix: fix logging for rollout by @nanjiangwill in #1188
sync internal features by @zhuzilin in #1192
Fix check_weights api by @zhuzilin in #1194
Add --custom-rollout-log-function-path and --custom-eval-rollout-log-function-path by @zhuzilin in #1196
[Feature] Add more logging for health monitor by @yitianlian in #1195
fix: SFT tools support by @maoquan-ms in #1198
[Featuren] Change default value of rollout health check by @yitianlian in #1197
Megatron VLM Support w/ SFT (2/N) by @Zhuohao-Li in #1150
tiny fix for sft script after tokenizer improvement by @Zhuohao-Li in #1201
tests: add test for multi turn loss mask by @maoquan-ms in #1204
Always pass loss masks to model by @zhuzilin in #1205
[on-policy distillation] update reward function to fix potential token mismatches by @ahxt in #1128
Add ci for mtp by @zhuzilin in #1207
Fix mla tflops by @lilei199908 in #1209
update docs by @zhuzilin in #1211
update docs by @zhuzilin in #1214
[Feature] Support 0.3.0 sglang router for fault tolerance by @yitianlian in #1215
sync internal features by @zhuzilin in #1216
feat: add custom logic for processing list[list[Sample]] to training data by @nanjiangwill in #1218
add int4_quant cuda kernel by @Hyaloid in #1220
update doc by @zhuzilin in #1224
Improve AMD tutorial with complete model/data setup workflow by @Vivicai1005 in #1212
update megatron patch by @zhuzilin in #1228
sync from internal by @zhuzilin in #1229
fix model saving bug in megatron by @zhuzilin in #1230
add new status by @nanjiangwill in #1219
update customization docs by @nanjiangwill in #1233
Revert data processing of VLM by @zhuzilin in #1232
[VLM] optimize VLM processing by @nanjiangwill in #1234
feat: add custom pg_loss reducer by @ChangyiYang in #1235
fix: typo "sgalng" → "sglang" in ROCm Dockerfiles by @yurekami in #1282
sync bugfix from internal by @zhuzilin in #1284
sync internal bugfix by @zhuzilin in #1286
add bshd support by @yueming-yuan in #1285
[docker] fix bugs on pd disaggregation and add --disable-draft-cuda-graph by @zhuzilin in #1288
Add longest_effective_sample_tokens_per_sec metric by @zhuzilin in #1291
[fix] conditionally pass kwargs for megatron-bridge VLM by @yueming-yuan in #1290
[VLM] Bugfix: image_patch_size for vision preprocessing by @coding-famer in #1227
feat: add --custom-model-provider-path argument by @yurekami in #1239
[Feature/Fix] Support IPv6 host resolution and robust URI formatting by @Chen-GX in #859
Fix missing trust_remote_code in HfWeightIteratorBridge by @SwordFaith in #1287
fix: remove invalid None default and fix misleading underscore variable naming by @lancerts in #1283
fix: remove duplicate Megatron-LM installation in build_conda.sh by @yurekami in #1238
fix dev megatron ckpt save bugs by @lilei199908 in #1294
[Fix] fix image_patch_size in processing_utils by @coding-famer in #1295
support hicache for pd disaggregation by @zhuzilin in #1296
Optimize data.py for efficient data loading by @ppraneth in #696
Auto Sync Code by @miles-code-angel in #1303
[VLM] end2end geo3k multi-turn RL of VLM Recipe by @gxlvera in #1141
[docker] Fix sglang ima on mtp + pd disaggregation by @zhuzilin in #1313
fix: fix processing logic by @nanjiangwill in #1292
[BugFix] Delete apply chat template for SFT by @PopSoda2002 in #1307
Remove token retrieval test from main. by @qqwqqw689 in #1243
update quick start doc by @zijiexia in #1193
fix: replace blocking sleep with async sleep and fix file handle leak by @lancerts in #1200
Set base_gpu_id for sglang from placement groups by @vpj in #1306
[FSDP] Move gptoss scripts by @PopSoda2002 in #1317
[refactor] Make sglang_rollout.py shorter and add prefix cached info by @zhuzilin in #1318
set spec args for mtp ci by @zhuzilin in #1322
Add non_generation_time stat in sample by @zhuzilin in #1323
ad slime test images ci by @lilei199908 in #1325
update sglang to lmsysorg/sglang:nightly-dev-20260103-24c91001 by @lilei199908 in #1324
update sglang to lmsysorg/sglang:nightly-dev-20260103-24c91001 by @lilei199908 in #1331
code sync by @miles-code-angel in #1329
perf: replace quadratic list flattening with linear chaining in rollout manager by @ppraneth in #1319
Implement local GPU ID remapping based on CUDA_VISIBLE_DEVICES for SGLang Engine by @zijiexia in #1327
Fix: Remove --apply-chat-template from Qwen3-235B SFT script by @kaysonyu in #1315
[Megatron Bridge] Support save hf format model by @coding-famer in #1289
update default paths and disable offloading for AMD qwen3-4B training by @Vivicai1005 in #1225
[internal sync] reset optimizer state and dynamic global batch size by @zhuzilin in #1330
[refactor] minor code refactor for save_hf by @zhuzilin in #1334
optimize long prompt filter by @zhuzilin in #1335
remove deprecated interface by @zhuzilin in #1336
Revert "remove deprecated interface" by @zhuzilin in #1337
code cleanup by @zhuzilin in #1338
save sglang v0.5.7 patch by @zhuzilin in #1339
[Feature] Add CI for fault tolerance by @yitianlian in #1222
Patch validate_non_overlapping_shards_metadata to speed up ckpt loading by @zhuzilin in #1342
[Feature] Reorganize CI by @yitianlian in #1343
[Feature] Option not to save optimizer states to save disk space by @yzlnew in #1333
Add clear_num_new_engines and some code cleanup by @zhuzilin in #1349
fix get_response_lengths by @zhuzilin in #1350
bugfix by @zhuzilin in #1351
default setting --tool-keys to tools by @UbeCc in #1352
code sync by @miles-code-angel in #1356
Revert "code sync" by @zhaochenyang20 in #1357
update code by @miles-code-angel in #1358
[sync] sync internal bugfixes by @zhuzilin in #1371
feat: add int4 reinforcement learning training support (Part1) by @GeLee-Q in #1362
[docker] Fix mtp r3 and add tilelang by @zhuzilin in #1380
[docker] Comment out 'quant weights to fp8 ue8m0' by @zhuzilin in #1381
geo3k VLM multi-turn megatron update by @gxlvera in #1378
[Doc] Add docs for R2/R3 by @Hecate0821 in #1382
[ci] borrow bot-slash-lint.yaml from miles by @zhuzilin in #1384
feat: add int4 reinforcement learning training support (Part2) by @fy1214 in #1172
feat: add int4 reinforcement learning training support (Part3) by @Gao016 in #1368
fix lint by @zhuzilin in #1385
VLM Multi-turn, add Megatron in README by @gxlvera in #1387
Fix grammar and formatting in README.md by @zhaochenyang20 in #1388
[refactor] refactor int4 qat code by @zhuzilin in #1390
[1/X] Refactor: unify training backends by general utils, tested Megatron & FSDP alignment by @yueming-yuan in #1373
bugfix by @zhuzilin in #1394
bugfix by @zhuzilin in #1395
fix ppo utils & mis by @yueming-yuan in #1396
feat(examples): add strands-sglang integration for agentic RL with TITO support by @Lawhy in #1359
remove swe-agent example by @zhuzilin in #1397
fix to suppot dpsk-v3.2 bf16 weight convert to fp8 by @Gao016 in #1392
Only rank0 should call post_process_weights by @zhuzilin in #1398
bugfix by @zhuzilin in #1400
[docker] support mtp in dpsk v3.2 by @zhuzilin in #1401
bug gix by @lilei199908 in #1403
[style] minor: remove subclass by @yueming-yuan in #1402
[doc] suggest pip install -e . --no-deps by @zhuzilin in #1405
[docs] add note for cudnn by @zhuzilin in #1406
[minor] Delete unused util file by @yueming-yuan in #1408
[docker] ignore slime in sglang SafeUnpickler by @zhuzilin in #1409
add r3 ci by @lilei199908 in #1407
Fix rollout-all-samples by @fzyzcjy in #1410
[docs] fix ai generate response by @zijiexia in #1412
bugfix on UpdateWeightFromDistributed by @zhuzilin in #1420
[docker] add tunable indexer is_neox_style by @zhuzilin in #1421
[docker] remove rm /root/.tmux.conf by @zhuzilin in #1422
[sync] sync internal feature and bugfix by @zhuzilin in #1423
[docker] update stable patches by @zhuzilin in #1424
skip logits.div when temp is 1.0 by @zhuzilin in #1428
[docker] support offload NSATokenToKVPool by @zhuzilin in #1429
[doc] cleanup redundant example and scripts by @zhuzilin in #1431
[docs] move low precision example into main doc by @zhuzilin in #1432
[docs] move reporducibility to main doc by @zhuzilin in #1433
[docs] a bit addition info for pd disaggregation by @zhuzilin in #1434
[docs] add debug suggestion for ima by @zhuzilin in #1435
Fix retool example incorrectly handling max_tool_calls by @fzyzcjy in #1427
Docs: Add qqr to "Projects Built upon slime" section by @bcol23 in #1425
[docs] fix doc by @zhuzilin in #1436
Fix Hf model to Mcore checkpoint conversion on AMD gpus by @gramesh-amd in #279
[cleanup] clean up utils folder by @zhuzilin in #1437
[FSDP][Fix] Fix redundant import by @Hecate0821 in #1354
[Fix] Update deprecated sglang ep args in docs and scripts by @coding-famer in #1344
Add Qwen3-Coder-30B-A3B-Instruct model script by @maoquan-ms in #1213
Revert "[style] minor: remove subclass" by @zhuzilin in #1441
[revert] revert the parallel state change by @zhuzilin in #1442
[FSDP] remove cp in fsdp by @zhuzilin in #1443
[fsdp] remove tis by @zhuzilin in #1444
Megatron VLM Support (Qwen2.5-VL series) (3/N) by @Zhuohao-Li in #1210
fix the loss mask for mask_offpolicy_in_partial_rollout by @zhuzilin in #1445
[Fix] Return origin_samples instead of False in filter_long_prompt by @kaysonyu in #1438
[docker] fix sglang streaming output bug by @zhuzilin in #1446
[docker] change base image from lmsysorg to slimerl/sglang by @zhuzilin in #1447
Fix: Apply loss mask to KL in REINFORCE++ returns calculation by @kaysonyu in #1372
[docs] add docs for ppo by @zhuzilin in #1448
[release] bump to v0.2.2 by @zhuzilin in #1345

New Contributors

@attack204 made their first contribution in #1071
@lin0303-siyuan made their first contribution in #1092
@ShanningZhuang made their first contribution in #1125
@rucnyz made their first contribution in #1116
@Lyken17 made their first contribution in #1147
@cklxx made their first contribution in #1101
@qqwqqw689 made their first contribution in #1163
@zijiexia made their first contribution in #1122
@lr-tsinghua11 made their first contribution in #1152
@jiahe7ay made their first contribution in #1182
@maoquan-ms made their first contribution in #1198
@Hyaloid made their first contribution in #1220
@Vivicai1005 made their first contribution in #1212
@yurekami made their first contribution in #1282
@miles-code-angel made their first contribution in #1303
@gxlvera made their first contribution in #1141
@vpj made their first contribution in #1306
@kaysonyu made their first contribution in #1315
@yzlnew made their first contribution in #1333
@gramesh-amd made their first contribution in #279

Full Changelog: v0.2.1...v0.2.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.2

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Major Updates

What's Changed

New Contributors

Contributors

Uh oh!