v0.2.2 is here! Thanks to everyone who contributed to this release.
Major Updates
In addition to multiple memory and performance improvements, v0.2.2 adds support for:
- Int4-QAT training
- Full R3 (Rollout Routing Replay) support with DeepEP and MTP
- Dependency upgrades: SGLang v0.5.7 and the Megatron dev branch
What's Changed
- add ckpt load save ci by @lilei199908 in #1104
- Add --rollout-all-samples-process-path for RLVE by @zhuzilin in #1107
- feat: support Qwen3 Moe BackEnd Kernel by @attack204 in #1071
- fix max response/context/prompt len by @lilei199908 in #1110
- fix max len by @lilei199908 in #1112
- [docker] remove amem and support deepep + r3 by @zhuzilin in #1115
- [Fix] Fix early return in init rollout engine by @yitianlian in #1118
- [Fix] Add sglang patch for weight version update by @yitianlian in #1119
- fix: improve tokenization by @nanjiangwill in #1113
- [Feature] Add CI test for weight version update by @yitianlian in #1120
- [docker] optimize r3 with base64 encode by @zhuzilin in #1124
- [docker] fix r3 gather buffer by @zhuzilin in #1129
- [docker] support mtp for r3 by @zhuzilin in #1131
- [Fix] Fix some bugs in retool example by @yitianlian in #1130
- Add finalize_model_grads_with_empty_cache by @zhuzilin in #1133
- Feat: add usage docs for fsdp by @lin0303-siyuan in #1092
- Reserve more ports for new sglang dp attn impl by @zhuzilin in #1142
- Blog: fix the path of the Blog's architecture image by @ShanningZhuang in #1125
- Support async save and add extra save at the end of the training by @zhuzilin in #1143
- fix: fix GemmeRMSNorm.forward() bug by @nanjiangwill in #1121
- [WIP][FSDP] Support FSDP for Qwen3Next by @rucnyz in #1116
- Megatron VLM Support (1/N) by @Zhuohao-Li in #1123
- Update deprecated huggingface-cli and fix broken links by @Lyken17 in #1147
- Added FSDP checkpoint handling to convert_torch_dist_to_hf.py by @cklxx in #1101
- minor fix for megatron compatibility by @zhuzilin in #1149
- Remove config_mapping to use megatron-bridge by @zhuzilin in #1166
- Avoids repeated work. by @qqwqqw689 in #1163
- Make tools/convert_torch_dist_to_hf.py not rely on megatron by @zhuzilin in #1167
- support converting dpsk mtp layer by @zhuzilin in #1169
- [FSDP] Add Masked importance sampling by @zijiexia in #1122
- [TIS/MIS] fix and add better metric by @ChangyiYang in #1174
- Fix optimizer schedule resume by @lr-tsinghua11 in #1152
- [docker] upgrade to megatron dev branch by @zhuzilin in #1153
- Minor fix by @lancerts in #1165
- Fix forward of Qwen3VLTextRotaryEmbedding in Megatron-Bridge by @zhuzilin in #1179
- Reuse the text llm config for qwen3 vl models by @zhuzilin in #1180
- Don't save AutoBridge in args by @zhuzilin in #1181
- [Fix] Fix port error in PD disaggregation setting by @yitianlian in #1175
- Fix prompt type bug in generate_with_search within examples/search-r1 by @jiahe7ay in #1182
- feat: support Qwen3 VL MoE by @nanjiangwill in #1171
- [Fix] Minor fix by @yitianlian in #1183
- Set parallel config for megatron bridge by @zhuzilin in #1184
- Fix tools/convert_hf_to_torch_dist.py by @zhuzilin in #1186
- Don't calculate entropy grad when coef is 0 by @zhuzilin in #1185
- Disable routing replay for critic by @zhuzilin in #1187
- Revert "Don't calculate entropy grad when coef is 0" by @zhuzilin in #1189
- Fix qwen3next for megatron dev branch by @zhuzilin in #1190
- fix: fix logging for rollout by @nanjiangwill in #1188
- sync internal features by @zhuzilin in #1192
- Fix check_weights api by @zhuzilin in #1194
- Add --custom-rollout-log-function-path and --custom-eval-rollout-log-function-path by @zhuzilin in #1196
- [Feature] Add more logging for health monitor by @yitianlian in #1195
- fix: SFT tools support by @maoquan-ms in #1198
- [Featuren] Change default value of rollout health check by @yitianlian in #1197
- Megatron VLM Support w/ SFT (2/N) by @Zhuohao-Li in #1150
- tiny fix for sft script after tokenizer improvement by @Zhuohao-Li in #1201
- tests: add test for multi turn loss mask by @maoquan-ms in #1204
- Always pass loss masks to model by @zhuzilin in #1205
- [on-policy distillation] update reward function to fix potential token mismatches by @ahxt in #1128
- Add ci for mtp by @zhuzilin in #1207
- Fix mla tflops by @lilei199908 in #1209
- update docs by @zhuzilin in #1211
- update docs by @zhuzilin in #1214
- [Feature] Support 0.3.0 sglang router for fault tolerance by @yitianlian in #1215
- sync internal features by @zhuzilin in #1216
- feat: add custom logic for processing list[list[Sample]] to training data by @nanjiangwill in #1218
- add int4_quant cuda kernel by @Hyaloid in #1220
- update doc by @zhuzilin in #1224
- Improve AMD tutorial with complete model/data setup workflow by @Vivicai1005 in #1212
- update megatron patch by @zhuzilin in #1228
- sync from internal by @zhuzilin in #1229
- fix model saving bug in megatron by @zhuzilin in #1230
- add new status by @nanjiangwill in #1219
- update customization docs by @nanjiangwill in #1233
- Revert data processing of VLM by @zhuzilin in #1232
- [VLM] optimize VLM processing by @nanjiangwill in #1234
- feat: add custom pg_loss reducer by @ChangyiYang in #1235
- fix: typo "sgalng" → "sglang" in ROCm Dockerfiles by @yurekami in #1282
- sync bugfix from internal by @zhuzilin in #1284
- sync internal bugfix by @zhuzilin in #1286
- add bshd support by @yueming-yuan in #1285
- [docker] fix bugs on pd disaggregation and add --disable-draft-cuda-graph by @zhuzilin in #1288
- Add longest_effective_sample_tokens_per_sec metric by @zhuzilin in #1291
- [fix] conditionally pass kwargs for megatron-bridge VLM by @yueming-yuan in #1290
- [VLM] Bugfix: image_patch_size for vision preprocessing by @coding-famer in #1227
- feat: add --custom-model-provider-path argument by @yurekami in #1239
- [Feature/Fix] Support IPv6 host resolution and robust URI formatting by @Chen-GX in #859
- Fix missing trust_remote_code in HfWeightIteratorBridge by @SwordFaith in #1287
- fix: remove invalid None default and fix misleading underscore variable naming by @lancerts in #1283
- fix: remove duplicate Megatron-LM installation in build_conda.sh by @yurekami in #1238
- fix dev megatron ckpt save bugs by @lilei199908 in #1294
- [Fix] fix image_patch_size in processing_utils by @coding-famer in #1295
- support hicache for pd disaggregation by @zhuzilin in #1296
- Optimize data.py for efficient data loading by @ppraneth in #696
- Auto Sync Code by @miles-code-angel in #1303
- [VLM] end2end geo3k multi-turn RL of VLM Recipe by @gxlvera in #1141
- [docker] Fix sglang ima on mtp + pd disaggregation by @zhuzilin in #1313
- fix: fix processing logic by @nanjiangwill in #1292
- [BugFix] Delete apply chat template for SFT by @PopSoda2002 in #1307
- Remove token retrieval test from main. by @qqwqqw689 in #1243
- update quick start doc by @zijiexia in #1193
- fix: replace blocking sleep with async sleep and fix file handle leak by @lancerts in #1200
- Set base_gpu_id for sglang from placement groups by @vpj in #1306
- [FSDP] Move gptoss scripts by @PopSoda2002 in #1317
- [refactor] Make sglang_rollout.py shorter and add prefix cached info by @zhuzilin in #1318
- set spec args for mtp ci by @zhuzilin in #1322
- Add non_generation_time stat in sample by @zhuzilin in #1323
- ad slime test images ci by @lilei199908 in #1325
- update sglang to lmsysorg/sglang:nightly-dev-20260103-24c91001 by @lilei199908 in #1324
- update sglang to lmsysorg/sglang:nightly-dev-20260103-24c91001 by @lilei199908 in #1331
- code sync by @miles-code-angel in #1329
- perf: replace quadratic list flattening with linear chaining in rollout manager by @ppraneth in #1319
- Implement local GPU ID remapping based on CUDA_VISIBLE_DEVICES for SGLang Engine by @zijiexia in #1327
- Fix: Remove --apply-chat-template from Qwen3-235B SFT script by @kaysonyu in #1315
- [Megatron Bridge] Support save hf format model by @coding-famer in #1289
- update default paths and disable offloading for AMD qwen3-4B training by @Vivicai1005 in #1225
- [internal sync] reset optimizer state and dynamic global batch size by @zhuzilin in #1330
- [refactor] minor code refactor for save_hf by @zhuzilin in #1334
- optimize long prompt filter by @zhuzilin in #1335
- remove deprecated interface by @zhuzilin in #1336
- Revert "remove deprecated interface" by @zhuzilin in #1337
- code cleanup by @zhuzilin in #1338
- save sglang v0.5.7 patch by @zhuzilin in #1339
- [Feature] Add CI for fault tolerance by @yitianlian in #1222
- Patch validate_non_overlapping_shards_metadata to speed up ckpt loading by @zhuzilin in #1342
- [Feature] Reorganize CI by @yitianlian in #1343
- [Feature] Option not to save optimizer states to save disk space by @yzlnew in #1333
- Add clear_num_new_engines and some code cleanup by @zhuzilin in #1349
- fix get_response_lengths by @zhuzilin in #1350
- bugfix by @zhuzilin in #1351
- default setting --tool-keys to tools by @UbeCc in #1352
- code sync by @miles-code-angel in #1356
- Revert "code sync" by @zhaochenyang20 in #1357
- update code by @miles-code-angel in #1358
- [sync] sync internal bugfixes by @zhuzilin in #1371
- feat: add int4 reinforcement learning training support (Part1) by @GeLee-Q in #1362
- [docker] Fix mtp r3 and add tilelang by @zhuzilin in #1380
- [docker] Comment out 'quant weights to fp8 ue8m0' by @zhuzilin in #1381
- geo3k VLM multi-turn megatron update by @gxlvera in #1378
- [Doc] Add docs for R2/R3 by @Hecate0821 in #1382
- [ci] borrow bot-slash-lint.yaml from miles by @zhuzilin in #1384
- feat: add int4 reinforcement learning training support (Part2) by @fy1214 in #1172
- feat: add int4 reinforcement learning training support (Part3) by @Gao016 in #1368
- fix lint by @zhuzilin in #1385
- VLM Multi-turn, add Megatron in README by @gxlvera in #1387
- Fix grammar and formatting in README.md by @zhaochenyang20 in #1388
- [refactor] refactor int4 qat code by @zhuzilin in #1390
- [1/X] Refactor: unify training backends by general utils, tested Megatron & FSDP alignment by @yueming-yuan in #1373
- bugfix by @zhuzilin in #1394
- bugfix by @zhuzilin in #1395
- fix ppo utils & mis by @yueming-yuan in #1396
- feat(examples): add strands-sglang integration for agentic RL with TITO support by @Lawhy in #1359
- remove swe-agent example by @zhuzilin in #1397
- fix to suppot dpsk-v3.2 bf16 weight convert to fp8 by @Gao016 in #1392
- Only rank0 should call post_process_weights by @zhuzilin in #1398
- bugfix by @zhuzilin in #1400
- [docker] support mtp in dpsk v3.2 by @zhuzilin in #1401
- bug gix by @lilei199908 in #1403
- [style] minor: remove subclass by @yueming-yuan in #1402
- [doc] suggest pip install -e . --no-deps by @zhuzilin in #1405
- [docs] add note for cudnn by @zhuzilin in #1406
- [minor] Delete unused util file by @yueming-yuan in #1408
- [docker] ignore slime in sglang SafeUnpickler by @zhuzilin in #1409
- add r3 ci by @lilei199908 in #1407
- Fix rollout-all-samples by @fzyzcjy in #1410
- [docs] fix ai generate response by @zijiexia in #1412
- bugfix on UpdateWeightFromDistributed by @zhuzilin in #1420
- [docker] add tunable indexer is_neox_style by @zhuzilin in #1421
- [docker] remove rm /root/.tmux.conf by @zhuzilin in #1422
- [sync] sync internal feature and bugfix by @zhuzilin in #1423
- [docker] update stable patches by @zhuzilin in #1424
- skip logits.div when temp is 1.0 by @zhuzilin in #1428
- [docker] support offload NSATokenToKVPool by @zhuzilin in #1429
- [doc] cleanup redundant example and scripts by @zhuzilin in #1431
- [docs] move low precision example into main doc by @zhuzilin in #1432
- [docs] move reporducibility to main doc by @zhuzilin in #1433
- [docs] a bit addition info for pd disaggregation by @zhuzilin in #1434
- [docs] add debug suggestion for ima by @zhuzilin in #1435
- Fix retool example incorrectly handling max_tool_calls by @fzyzcjy in #1427
- Docs: Add qqr to "Projects Built upon slime" section by @bcol23 in #1425
- [docs] fix doc by @zhuzilin in #1436
- Fix Hf model to Mcore checkpoint conversion on AMD gpus by @gramesh-amd in #279
- [cleanup] clean up utils folder by @zhuzilin in #1437
- [FSDP][Fix] Fix redundant import by @Hecate0821 in #1354
- [Fix] Update deprecated sglang ep args in docs and scripts by @coding-famer in #1344
- Add Qwen3-Coder-30B-A3B-Instruct model script by @maoquan-ms in #1213
- Revert "[style] minor: remove subclass" by @zhuzilin in #1441
- [revert] revert the parallel state change by @zhuzilin in #1442
- [FSDP] remove cp in fsdp by @zhuzilin in #1443
- [fsdp] remove tis by @zhuzilin in #1444
- Megatron VLM Support (Qwen2.5-VL series) (3/N) by @Zhuohao-Li in #1210
- fix the loss mask for mask_offpolicy_in_partial_rollout by @zhuzilin in #1445
- [Fix] Return origin_samples instead of False in filter_long_prompt by @kaysonyu in #1438
- [docker] fix sglang streaming output bug by @zhuzilin in #1446
- [docker] change base image from lmsysorg to slimerl/sglang by @zhuzilin in #1447
- Fix: Apply loss mask to KL in REINFORCE++ returns calculation by @kaysonyu in #1372
- [docs] add docs for ppo by @zhuzilin in #1448
- [release] bump to v0.2.2 by @zhuzilin in #1345
New Contributors
- @attack204 made their first contribution in #1071
- @lin0303-siyuan made their first contribution in #1092
- @ShanningZhuang made their first contribution in #1125
- @rucnyz made their first contribution in #1116
- @Lyken17 made their first contribution in #1147
- @cklxx made their first contribution in #1101
- @qqwqqw689 made their first contribution in #1163
- @zijiexia made their first contribution in #1122
- @lr-tsinghua11 made their first contribution in #1152
- @jiahe7ay made their first contribution in #1182
- @maoquan-ms made their first contribution in #1198
- @Hyaloid made their first contribution in #1220
- @Vivicai1005 made their first contribution in #1212
- @yurekami made their first contribution in #1282
- @miles-code-angel made their first contribution in #1303
- @gxlvera made their first contribution in #1141
- @vpj made their first contribution in #1306
- @kaysonyu made their first contribution in #1315
- @yzlnew made their first contribution in #1333
- @gramesh-amd made their first contribution in #279
Full Changelog: v0.2.1...v0.2.2