v0.2.0
We are thrilled to announce the release of slime v0.2.0! Thanks to the incredible support and contributions from our community, slime has gained significant features and substantial performance enhancements in this version.
Major Updates
- FSDP Backend: Introduced a fully Fully Sharded Data Parallel (FSDP) based training backend for improved scalability.
- PPO Support: Added native support for Proximal Policy Optimization (PPO).
- MTP Training: Enabled training of the MTP (Multi-Token Prediction) during Reinforcement Learning.
- FP8 Full Stack: Support for both FP8 training and FP8 inference.
- Train-Inference Mismatch: Alleviate or even eliminate train-inference mismatch
- Importance Sampling: Custom interface for train-infer importance sampling (e.g., MIS).
- Routing Replay: Added Rollout Routing Replay (R3) and Routing Replay (R2).
- True On-Policy Training: Enabled strictly on-policy training with dense models on the FSDP backend.
- Performance Improvements
- Memory Optimization: CUDA Graphs offload, asystem-amem integration.
- Faster Weight Updates: Significantly accelerated FP8 weight updates.
- Python-based Router: A new slime router implemented in pure Python for accessibility.
- Fault Tolerance: Added robustness with fault tolerance for the rollout engines.
- Custom Configs: Support for passing customized configurations via
--config. - [Experimental] Checkpoint Loading: Added support for Megatron-bridge based checkpoint loading.
- New Examples
- Fully Async Training
- Multi-Agent Scenarios
- On-Policy Distillation
- Retool
What's Changed
- [Doc typo] Update amd_tutorial.md by @yushengsu-thu in #246
- [bugfix] use fp32 for rollout_log_probs by @zhuzilin in #245
- Complete the RayTrainGroup args string docs. by @MrAta in #248
- Update speculative decoding doc and sglang patch by @guapisolo in #250
- fix debug-rollout-only by @zyzshishui in #249
- retool in one commit by @maocheng23 in #237
- fix: modify the rotary-base of qwen-3b to 1000000 for consistency by @YuchenFan48 in #252
- update logging and fix typo by @maocheng23 in #254
- [bugfix] fix read data containing "tools" field by @Maybewuss in #255
- Revert "[bugfix] fix read data containing "tools" field" by @zhuzilin in #256
- add shell script for qwen3-32B task by @Gao016 in #253
- docs: Fix custom interface documentation errors by @GeLee-Q in #251
- [example] Add fully async example by @zhuzilin in #258
- added sphinx-based documentation by @FrankLeeeee in #262
- fixed build error for documentation by @FrankLeeeee in #263
- [bugfix] Fix bugs on multi samples from one prompt (multi-agent) by @zhuzilin in #260
- fixed sphinx configuration by @FrankLeeeee in #264
- [bugfix] fix read data containing "tools" field by @Maybewuss in #259
- add DeepWiki badge by @richardodliu in #265
- [doc] add example doc to the website by @zhuzilin in #267
- [doc] add blogs by @zhuzilin in #268
- Update actor_group.py by @zlH518 in #266
- [doc] prettify language convertion toggle by @zhuzilin in #270
- [example] add an example for multi-agent rl by @yinpeisu in #269
- [refactor] Add isort back and move global gloo to global util by @zhuzilin in #273
- [refactor] remove over_sampling_filter and extract some functions by @zhuzilin in #278
- [feat] init support for FSDP by @zhuzilin in #282
- Chatbot entry for Sphinx style docs by @jhinpan in #284
- Revert "Chatbot entry for Sphinx style docs" by @zhuzilin in #286
- Remove get_rollout_data from actor_group by @MrAta in #285
- Add docs logo by @jhinpan in #283
- [Hardware] AMD Dockerfile update - support up to d4a7741 (Sep 6, 2025) by @yushengsu-thu in #307
- [feat] init xtuner backend by @zhuzilin in #310
- [docker] update to sglang 0.5.2rc2 by @zhuzilin in #313
- Add model version attribute in each sample by @yitianlian in #271
- [nfc] cleanup for weight_version by @zhuzilin in #314
- Add raw reward metric in fdsp backend by @yitianlian in #315
- fix: check args.save when save_interval is set. by @SanftMonster in #308
- Fix comment for --load parameter in checkpoint configuration (Quick Start Doc) by @Arist12 in #306
- [refactor] bind numa and rename num_gpus_per_node by @zhuzilin in #316
- [xtuner] unroll TrainingWorker and TrainEngine by @zhuzilin in #322
- [xtuner] add wandb by @zhuzilin in #324
- [bugfix] fix no weight_version for aborted samples by @zhuzilin in #327
- [FSDP] Verify FSDP backend availability via uv install / pip install by @Zhuohao-Li in #325
- Add FSDP extras dependency and import test (#302) by @souhil25 in #303
- fix: small bug fix in the rollout_buffer_example.sh by @rbao2018 in #328
- [refactor] remove slime/backend/utils and extract slime_validate_args by @zhuzilin in #329
- feat: auto configure megatron from hf config. by @SanftMonster in #312
- Do not read ip if env is provided by @oraluben in #337
- [rm_hub] fix ground_truth type error in grade_answer_verl by @GGGGGGXY in #336
- [feat] use one global httpx.AsyncClient and remove --use-http2 by @zhuzilin in #338
- [Refactor] Merge rollout controller into rollout manager by @PopSoda2002 in #304
- add dockerfile and patch for b200 by @maocheng23 in #340
- [feat] init support for PPO by @zhuzilin in #342
- wrong expressions and typo by @ArtificialZeng in #343
- Add basic VLM data pipeline by @ppraneth in #335
- [FSDP] Add reference model support for correct KL loss computation #296 by @UbeCc in #344
- fix incorrect sft loss mask for qwen3 thinking series models. by @luppx in #330
- feature: ppo by @lilei199908 in #347
- [FIX] NVLINK detection method in scripts by @JustinTong0323 in #356
- fix lint by @JustinTong0323 in #358
- [feat] add --critic-lr and --num-critic-only-steps by @zhuzilin in #350
- [refactor] Add actor registry by @zhuzilin in #359
- Added GB200 patches for SGLang v0.5.2 by @sam571128 in #360
- [bugfix] fix the num_tokens used for per_token_loss in multi-turn training by @zhuzilin in #365
- [Feature] Support token in token out for multi turn tasks by @yitianlian in #242
- [router] support slime-router only by @zhuzilin in #366
- [router] extract middleware folder by @zhuzilin in #367
- [feat] support distributed post to enable more concurrent requests by @zhuzilin in #368
- [FEAT] Deterministic rollout by @JustinTong0323 in #361
- [reproducibility][docker] enable training reproducibility by @zhuzilin in #370
- [feat] enable use_flattened_tensor_bucket with quantization config by @zhuzilin in #374
- [fix] fix ppo bugs by @lilei199908 in #373
- docs: add B200/H-series GPU hardware support information by @Williamren97 in #380
- [model] fix run-qwen3-30B-A3B.sh by @yefei12 in #382
- Enable loss mask for sft by @UbeCc in #377
- [fix] fix paths in get_started.md by @hyleepp in #375
- [FSDP] Data Packing Implementation in FSDP backend by @jhinpan in #321
- [feat] add --use-routing-replay by @zhuzilin in #387
- fix bug for convert Qwen3-235B-A22B HF model weight to Megatorn torch_dist format by @Gao016 in #386
- [FSDP] Add update weight class from distributed by @PopSoda2002 in #341
- [docker] fix routing replay with pp and other bugfixes by @zhuzilin in #395
- Fix Pipeline Parallelism deadlock in distributed_masked_whiten by @Chen-GX in #393
- Fix some bugs in slime router by @yitianlian in #389
- [Fix] ppo rollout engins for distribute by @lilei199908 in #394
- [refactor] remove Registry and change the order of init by @zhuzilin in #398
- [FSDP][Xtuner] Delete ray registry by @PopSoda2002 in #400
- [feat] support fault tolerant for rollout engines by @zhuzilin in #405
- [Doc] Fix readme by @lancerts in #407
- Include type annotation in update_weight_utils.py by @lancerts in #401
- Add two related slime-based projects intro to readme by @jhinpan in #409
- [FSDP] optimize: fix FSDP memory overhead by @Williamren97 in #357
- [Doc] Replace Chinese chars to English by @lancerts in #408
- [FSDP] fix serialization issue and add support for sharded mode flatten tensor by @WWWjiahui in #410
- [FSDP][Feature] TIS implementation by @MrWhitezz in #390
- [model] Support Qwen3-Next-80B-A3B by @zhuzilin in #417
- Update train_async.py by @Guido1Alessandro1Trevisan in #413
- [xtuner] remove xtuner backend by @zhuzilin in #424
- [test] add tensorboard by @none0663 in #420
- [doc] update readme zh by @lancerts in #422
- [Doc] Include type annotations for sglang_rollout by @lancerts in #415
- [Docs] Fix GRPO related docs by @zhaochenyang20 in #428
- Adding tensorboard by @zhaochenyang20 in #427
- [CI] Unify isort CI by adding thirdparty group by @zhaochenyang20 in #430
- [docs] adding pre-commit guidance and unify EN/ZH docs by @zhaochenyang20 in #431
- [Doc] Add type annotation in actor.py by @lancerts in #436
- Super tiny fix typo by @fzyzcjy in #434
- [Hardware] AMD - Replace vllm CuMemAllocator dependency with torch_memory_saver by @yushengsu-thu in #444
- Super tiny remove unused distributed_args by @fzyzcjy in #439
- Super tiny remove setting new field rollout_id by @fzyzcjy in #440
- Fix inconsistent naming and add default tb_project_name in tensorboard_utils by @coding-famer in #435
- [Feat]: add pipelined weight update by @GeLee-Q in #432
- [Doc] Include the docs in update_weight by @lancerts in #433
- fix Multimodal printing type error risk, return type annotation consistency, offload image processing to a background thread by @lancerts in #423
- Tiny fix some test script errors by @fzyzcjy in #443
- Fix FSDP error when DTensor device mesh is empty by @fzyzcjy in #448
- [FSDP] fix oom on grad_norm and some code cleanup by @zhuzilin in #451
- Fix hostname resolution when DNS/hosts missing by @yefei12 in #449
- Upgrade slime router by @yitianlian in #418
- Change trainer backend from environment variable to args by @fzyzcjy in #442
- Fix async training error in last rollout by @fzyzcjy in #452
- Allow checking accuracy correctness programmatically by @fzyzcjy in #453
- Super tiny refactor the rollout address computation by @fzyzcjy in #455
- Tiny refactor engine wrapper by @fzyzcjy in #456
- Allow users provide their own rollout engines instead of launched by framework by @fzyzcjy in #457
- Super tiny refactor wandb initialization by @fzyzcjy in #462
- Super tiny add logs when kl checker fails by @fzyzcjy in #458
- Allow wandb have SGLang engine metrics such as time vs generation throughput by @fzyzcjy in #463
- Fix old actor update by @yitianlian in #461
- Fix typo and optimize TensorBoard logging by @none0663 in #464
- Tiny refactor and extract RolloutHealthMonitor by @fzyzcjy in #465
- [docker] upgrade to sglang v0.5.3.post1 by @zhuzilin in #472
- Avoid duplicated state between the rollout engines variables by @fzyzcjy in #466
- Super tiny fix typo by @fzyzcjy in #469
- [Doc] Include type annotation for cp_utils and model_provider by @lancerts in #468
- Super tiny refactor branching in getting dataset samples by @fzyzcjy in #470
- GSPO by @ppraneth in #454
- [bugfix] initialize rollout manager first to calculate num_rollout by @zhuzilin in #473
- Super tiny add Qwen3-1.7B megatron config by @fzyzcjy in #478
- [Doc] Include type annotation in actor.py with minor fixes by @lancerts in #474
- Tiny add Sample.group_index by @fzyzcjy in #475
- Fix debug_rollout_only incompatible with colocate in gpu allocation by @fzyzcjy in #477
- [ci] add --fp8-param-gather in ci for te 2.8.0 by @zhuzilin in #480
- [ci] remove fp8 in ci by @zhuzilin in #481
- Allow rollout fn to provide metrics and improve its extensibility by @fzyzcjy in #471
- [Doc ]Include type annotation and doc string for data.py by @lancerts in #485
- Update CI (part 1/N) by @fzyzcjy in #441
- [docker] trim sglang.patch by @zhuzilin in #493
- use SingletonMeta for _TensorboardAdapter by @zhuzilin in #494
- [Feature] Add ref model update interval argument by @yitianlian in #490
- Tiny refactor dynamic_filter for extensibility by @fzyzcjy in #482
- [Doc ]Include type annotation and doc string for update_weight_utils.py by @lancerts in #484
- [fault_tolerance] add --use-fault-tolerance and disable fault tolerance by default by @zhuzilin in #497
- allow setting sglang_router_port and bugfix in sglang.patch by @zhuzilin in #503
- [Doc] Include type annotation and docstring in loss.py by @lancerts in #498
- [Doc] Update type annotation for slime/slime/backends/megatron_utils/update_weight_utils.py by @lancerts in #500
- [Doc] Include type annotation and doc string for slime/slime/backends/megatron_utils/model.py by @lancerts in #499
- Compute metrics for zero-variance rollout samples by @fzyzcjy in #479
- Add metric of dynamic filter drop reasons by @fzyzcjy in #496
- Super tiny code cleanup by @fzyzcjy in #486
- fix: Correctly display sglang_tensor_parallel_size in startup logs by @Gao016 in #459
- fix rollout dataset loading by @zhuzilin in #504
- Super tiny refactor code for reuse by @fzyzcjy in #505
- Super tiny fix typo by @fzyzcjy in #511
- [Doc] Update fsdp_utils type annotation based on PEP guide by @lancerts in #509
- Fix learning rate warmup parameters by @fzyzcjy in #516
- Fix OOM in some SFT cases by @fzyzcjy in #517
- Support custom argument parsing by yaml file by @guapisolo in #521
- fix: prevent OOM when converting DeepSeek-V3 models by enabling memory-efficient loading by @mmy360 in #524
- Fix patching by pinning megatron by @fzyzcjy in #535
- Refactoring training inference importance sampling with seqeunce/geometry level by @zhaochenyang20 in #429
- Tiny fix k1 estimator typo and low_var_kl comments by @fzyzcjy in #483
- fix bug in fully async example (issue #488) by @Lez-3f in #519
- Print example data in SFT by @fzyzcjy in #528
- [FSDP] Optimizer CPU offload and other weight loading fix by @Hecate0821 in #536
- Tiny cleanup unused function by @coding-famer in #539
- fix load data label type by @rbao2018 in #538
- Super tiny add actor_train_tok_per_s perf metric by @fzyzcjy in #531
- [Feat] Support offload cuda graph by @ryang-max in #354
- Super tiny fix doc by @fzyzcjy in #546
- Tiny fix FlattenedTensorBucket import error for latest SGLang by @fzyzcjy in #553
- [docker] trim megatron and sglang patch by @zhuzilin in #552
- [docker] upgrade to megatron v0.14.0 by @zhuzilin in #554
- [docker] install fa3 for better performance by @zhuzilin in #556
- offload router replay indices and cleanup requirements.txt by @zhuzilin in #559
- Fix: rollback pipeline update weights because fp8 rollout bug when rollout tp > 1 by @lilei199908 in #561
- Fix _TensorboardAdapter Singleton mode by @coding-famer in #550
- [ci] update doc ci by @zhuzilin in #562
- Support dumping eval sample details by @fzyzcjy in #525
- fix(data): correct length filtering from character to token level by @yuzhu-cai in #548
- Fix logprob does not handle temperature by @fzyzcjy in #557
- Support slicing datasets by @fzyzcjy in #526
- Super tiny install rsync in container by @fzyzcjy in #529
- Support pass@K metrics in eval and related refactors by @fzyzcjy in #532
- Removing whitespace warning; Minor update by @foreverpiano in #542
- Tiny remove unused code by @fzyzcjy in #558
- Support evaluation-only runs by @fzyzcjy in #527
- Report reward category statistics by @fzyzcjy in #533
- Tiny update memory printing by @fzyzcjy in #563
- Try to fix train.py wrong logic about saving checkpoints by @fzyzcjy in #564
- Tiny fix fsdp distributed update weight error by @fzyzcjy in #567
- Tiny extract and enhance oom dumper by @fzyzcjy in #568
- Support true on policy by @fzyzcjy in #566
- Pin torch_memory_saver version by @fzyzcjy in #570
- Split args.offload to train and rollout by @fzyzcjy in #569
- Bump torch_memory_saver to include the OOM fix by @fzyzcjy in #573
- Tiny call clear cache when offloading rollout but not train by @fzyzcjy in #574
- Tiny fix fsdp prob diff computation when tis is disabled by @fzyzcjy in #575
- Fix offload_train rename in fsdp by @fzyzcjy in #576
- Tiny refactor and print more in memory utils by @fzyzcjy in #577
- Fix fsdp training sleep return too early causing OOM by @fzyzcjy in #578
- Workaround for SGLang release_memory_occupation return too early causing OOM by @fzyzcjy in #579
- Add Qwen3-4B demo with optional true on policy by @fzyzcjy in #580
- Super tiny fix merge error by @fzyzcjy in #581
- Add simple true on policy in 4B demo by @fzyzcjy in #582
- Tiny enhance command_utils for experiment scripting by @fzyzcjy in #571
- Support multi task evaluation by @zyzshishui in #585
- [FSDP][ready for review] Reduce peak memory required when gathering log-probs by @tyler-romero in #520
- fix omegaconfig import by @zhuzilin in #586
- Super tiny remove duplicated dependency by @fzyzcjy in #587
- Tiny enhance profiler and add memory-snapshot-num-steps by @fzyzcjy in #588
- Fix latest SGLang not supported for true on policy mode by @fzyzcjy in #589
- Support not offloading components in colocate mode by @fzyzcjy in #590
- Fix FSDP oom error when SGLang increases memory usage by @fzyzcjy in #591
- Enhance example scripts and command utils by @fzyzcjy in #592
- Try to synchronize after FSDP wakeup by @fzyzcjy in #593
- Tiny fix http post does not follow max retries by @fzyzcjy in #594
- use Pydantic for multi task evaluation config by @zyzshishui in #595
- Add abs to train_rollout_logprob_diff by @fzyzcjy in #596
- Super tiny add multiple eval tasks in the 4B demo by @fzyzcjy in #598
- Fix true on policy by conditionally disable softmax compile by @fzyzcjy in #599
- Fix fault tolerance for inference engines which fails to work before by @fzyzcjy in #600
- Super tiny print and fix by @fzyzcjy in #601
- Minor cleanup train engine offload related by @fzyzcjy in #602
- Compile rope while preserving true on policy by @fzyzcjy in #603
- Feat/add rollout logporbs ratio by @lilei199908 in #605
- [script] use flash attn backend for dpsk by @zhuzilin in #606
- change rollout_num_gpus is not None to rollout_num_gpus for sft by @UbeCc in #613
- support combined 1f1b by @zhuzilin in #565
- Support offloading in FSDP backend based on moving tensors by @fzyzcjy in #607
- Tiny update configs for qwen3 fsdp demo by @fzyzcjy in #608
- Support apply chat template kwargs like non-thinking qwen by @fzyzcjy in #609
- clear num_new_engines when fault tolerance is disabled by @zhuzilin in #622
- Support OOD Eval Tasks (GPQA, IFBench) by @zyzshishui in #597
- Fix releasing tensors too late causing high memory usage by @fzyzcjy in #616
- Add train-env-vars for cases like pytorch memory env vars by @fzyzcjy in #614
- Enable expandable segments to reduce memory reservation by @fzyzcjy in #615
- Fix fsdp2 wrong usage causing tensors not correctly sharded by @fzyzcjy in #617
- Super tiny script to display debug dumped rollout data by @fzyzcjy in #618
- [model] use self attn in megatron for gated attn by @zhuzilin in #624
- move pg_loss into tis_function for icepop by @zhuzilin in #635
- Tiny change the 4B fsdp demo script by @fzyzcjy in #627
- Tiny make IFBench lazily imported to avoid errors by @fzyzcjy in #628
- Super tiny rename total_train_time to step_time for clarity by @fzyzcjy in #629
- Support FSDP Checkpoint Saving & Loading by @zyzshishui in #633
- Super tiny fix lint by @fzyzcjy in #626
- [Doc] Fix spec decode doc by @ryang-max in #623
- Add kimi-k2 by @Gao016 in #560
- Small fix on --num-rollout and --num-epoch by @Zhuohao-Li in #620
- Tiny extract training profiler by @fzyzcjy in #631
- fix typo in scripts by @zyzshishui in #634
- fix typo by @zhuzilin in #636
- bugfix by @zhuzilin in #637
- Refactor and extract training perf metrics by @fzyzcjy in #630
- Refactor and add timers in training by @fzyzcjy in #632
- Enhance the 4B FSDP script and use Typer by @fzyzcjy in #643
- Add converted performance metrics in FSDP backend by @fzyzcjy in #645
- Add repetition check function by @zhuzilin in #652
- Support MTP training by @guapisolo in #640
- [docker] fix mtp-rl training and upgrade to sglang 0.5.4.post1 by @zhuzilin in #655
- Fix FSDP Reference Model Loading by @zyzshishui in #656
- Add profiler for FSDP by @fzyzcjy in #646
- Tiny add load_debug_rollout_data_subsample by @fzyzcjy in #647
- Fix fsdp cannot do computation communication overlap by @fzyzcjy in #648
- Tiny add response length metrics by @fzyzcjy in #649
- Tiny improve 4b fsdp script such as fixing rm_type by @fzyzcjy in #650
- Add moonlight-16B-A3b running script by @Gao016 in #653
- [FSDP] Remove redundant GPU memory restore and improve code style by @Hecate0821 in #658
- Extract and support saving debug train data for FSDP by @fzyzcjy in #659
- Support profiling single training forward-backward and logprob-computation step by @fzyzcjy in #660
- Decouple IS Weights from Rejection Sampling in MIS by @yueming-yuan in #657
- [FSDP] Add more eval metrics to align Megatron by @Zhuohao-Li in #671
- Fix code incompatible with Megatron with less patch by @fzyzcjy in #668
- set weights_only=False for load_debug_rollout_data by @zhuzilin in #680
- Fix Mimo-7B-RL special mtp structure by @guapisolo in #691
- Revert "Fix Mimo-7B-RL special mtp structure" by @zhuzilin in #692
- Add kimi-k2-instruct running script by @Gao016 in #694
- Add Search-R1 example for Importance Sampling by @ChangyiYang in #688
- fix bug on gspo by @zhuzilin in #695
- Tiny fix megatron load from checkpoint error by @fzyzcjy in #662
- Add script to send prompts to SGLang with compatible arguments by @fzyzcjy in #663
- [docker] update sglang patch by @zhuzilin in #704
- [Revert revert] Fix Mimo-7B-RL special mtp structure by @guapisolo in #707
- support fp16 training by @zhuzilin in #708
- detach lm_head when training mtp by @zhuzilin in #709
- [on-policy distillation] support and related data handling by @ahxt in #673
- Format code and fix missing arguments by @zyzshishui in #710
- Doc of true on policy done 3 week ago by @fzyzcjy in #711
- Super tiny make multi eval disabled by default for quick experiments by @fzyzcjy in #712
- Support rollout routing replay by @zhuzilin in #715
- Add script to recompute metrics for an existing run by @fzyzcjy in #664
- Super tiny add assertion for non existing feature in fsdp by @fzyzcjy in #665
- [FSDP] Migrate FSDP Checkpointing to PyTorch Distributed Checkpoint by @Hecate0821 in #677
- Refactor metrics computation to add more metrics to eval by @fzyzcjy in #666
- Fix NCCL out-of-memory error even when there is memory by @fzyzcjy in #669
- Print memory info when NCCL errors by @fzyzcjy in #670
- Update doc by @fzyzcjy in #716
- Doc by @zyzshishui in #717
- Update doc by @fzyzcjy in #723
- Update MTP training doc by @guapisolo in #718
- Tiny remove non-effective tis to avoid confusion by @fzyzcjy in #713
- Tiny add repetition and truncation metrics by @fzyzcjy in #719
- Tiny fix compatibility with older SGLang by @fzyzcjy in #720
- Tiny add utility to debug reward functions by @fzyzcjy in #721
- Enable mm fallback variant in SGLang by @fzyzcjy in #726
- Super tiny add Qwen3-4B-Instruct-2507 model config by @fzyzcjy in #661
- Tiny fix errors by @fzyzcjy in #728
- Lock versions for Dockerfile by @fzyzcjy in #727
- Super tiny remove unused docker-related files by @fzyzcjy in #731
- Add formal mathematics example with RL and SFT by @fzyzcjy in #733
- Add tests for external rollout feature by @fzyzcjy in #734
- Allow multiple GitHub action runners to compete for the same GPUs by @fzyzcjy in #735
- Tiny update command utils by @fzyzcjy in #737
- Super tiny fix typo by @fzyzcjy in #738
- Tiny let wandb record some environment variables by @fzyzcjy in #739
- Tiny fix dataclass_cli does not support multiple typer commands by @fzyzcjy in #740
- Fix mtp rl detach by @zhuzilin in #746
- Tiny support parsing slurm num nodes and extra env vars by @fzyzcjy in #741
- Refactor memory profiler by @fzyzcjy in #742
- Tiny enhance exec_command by @fzyzcjy in #743
- Tiny enhance http call error logging by @fzyzcjy in #744
- Super tiny add truncated-layers deepseek model config by @fzyzcjy in #745
- Support megatron backend in 4b demo script by @fzyzcjy in #667
- Update 4B training script and fix oom issue by @fzyzcjy in #747
- Support memray for host memory profiling by @fzyzcjy in #748
- Support more cases for megatron in 4b demo script by @fzyzcjy in #749
- Super tiny fix missing code by @fzyzcjy in #750
- [doc] update spec decoding doc by @zhuzilin in #752
- Fix MTP loss mask intersection by @guapisolo in #751
- [FSDP] Optimize weight update in distributed mode by @Hecate0821 in #729
- Refactor and simplify Dockerfile by @fzyzcjy in #754
- Super tiny ping SGLang version in Dockerfile for reproducibility by @fzyzcjy in #755
- Tiny add docker build and upload script with locked version by @fzyzcjy in #756
- Support ue8m0 quantization from training to inference by @fzyzcjy in #758
- Fix errors not raised in http calls by @fzyzcjy in #759
- Fix error when having local http proxy by @fzyzcjy in #757
- Tiny remove unused file by @fzyzcjy in #753
- Super tiny rename rocm dockerfile by @fzyzcjy in #760
- fix sglang compatiblity by @zhuzilin in #761
- rearrange import by @zhuzilin in #762
- Refactor weight updater to avoid depending on weight backup dict by @fzyzcjy in #764
- Refactor CI and e2e tests by @fzyzcjy in #736
- Tiny refactor train script by @fzyzcjy in #765
- Super tiny add onload wrapper API by @fzyzcjy in #766
- simplify get_model by @zhuzilin in #768
- better distributed init, also supports slurm by @ad8e in #763
- Fix typos and cleanup prints that are commented out by @lancerts in #725
- Fix sglang offload for qwen3 next by @zhuzilin in #769
- Fix cp for qwen3 next by @zhuzilin in #770
- Add RLVE in slime projects by @zhaochenyang20 in #773
- [Feature] Tiny fix for wandb run id by @yitianlian in #730
- [FSDP] Support Context parallelism for FSDP using ring-flash-attn by @PopSoda2002 in #467
- Super tiny fix code fmt by @fzyzcjy in #775
- Refactor and add TensorBackuper abstraction by @fzyzcjy in #771
- Support saving memory by disabling tensor backuper by @fzyzcjy in #776
- add ring_flash_attn to requirements.txt by @zhuzilin in #778
- Tiny fix model config error when sourcing files by @fzyzcjy in #784
- Tiny reduce peak memory usage in weight update by @fzyzcjy in #785
- Super tiny print more info when OOM by @fzyzcjy in #786
- Add GB300 docker image by @fzyzcjy in #787
- remove warning for fsdp backend by @zhuzilin in #789
- Super tiny update custom github runner by @fzyzcjy in #790
- Super tiny add sample scripts including GB300 demo by @fzyzcjy in #791
- update pics in true-on-policy doc by @zyzshishui in #783
- Support multi dtypes in FlattenedTensorBucket by @zhuzilin in #793
- Fix: make raw_reward optional in process_rollout_data by @GavinZhu-GMI in #792
- Apply megatron commit d8c6aa4c to fix Blackwell missing saving checkpoint by @fzyzcjy in #794
- Super tiny further refactor wandb_run_id by @fzyzcjy in #796
- Super tiny configure logger by @fzyzcjy in #797
- Tiny change printing to logging by @fzyzcjy in #798
- Fix SGLang Router Endpoint by @sam571128 in #683
- Add backward compatiblility for sgl-router by @zhuzilin in #802
- Super tiny add comments for train_async by @fzyzcjy in #800
- Tiny unify wandb and tensorboard code by @fzyzcjy in #801
- Fix git failure even when the patch is correct by @fzyzcjy in #799
- Super tiny refactor wandb step computation by @fzyzcjy in #803
- fix backward compatibility for slime router by @zhuzilin in #805
- Super tiny loosen sglang-router requirements by @fzyzcjy in #804
- [FSDP] fix the rollout/raw_reward metrics calculation by @Zhuohao-Li in #806
- More TIS features; skip recompute; mismatch metrics without TIS by @yueming-yuan in #690
- Super tiny fix duplicated arg by @fzyzcjy in #809
- support rollout routing replay for model with dense layer by @zhuzilin in #810
- fix bug on abort_request by @zhuzilin in #811
- Add P1 in slime projects by @JC-Chen1 in #812
- Tiny fix worker abort request bug by @fzyzcjy in #814
- Add draft example for DeepSeek by @fzyzcjy in #816
- Super tiny reduce Megatron logging verbosity and fix log message by @fzyzcjy in #817
- Unify dockerfile for multiple hardwares by @fzyzcjy in #815
- Super tiny delete outdated file by @fzyzcjy in #820
- Super tiny add SGLANG_ENABLE_HEALTH_ENDPOINT_GENERATION flag by @fzyzcjy in #823
- Refactor command_utils and scripts by @fzyzcjy in #824
- [FSDP][BugFix] Avoid autocast for computing log prob for true on policy by @PopSoda2002 in #833
- fix ppo bugs by @lilei199908 in #835
- Super tiny fix readme by @fzyzcjy in #836
- pad token to tp_size * 128 to enable tp_size by @zhuzilin in #838
- Update qwen3-30b-a3b script by @fzyzcjy in #825
- Super tiny remove unused script variable by @fzyzcjy in #831
- Fix multi node nccl slowness in grace blackwell by @fzyzcjy in #839
- Tiny refactor checkpoint conversion script by @fzyzcjy in #840
- Pre-install main package inside container by @fzyzcjy in #841
- add fp8 training examples by @xieck13 in #821
- add FP8 training and inference script for Qwen3-30B-A3B model by @yefei12 in #845
- fix ppo stuck when use_rollout_logprobs and fix typo by @zhuzilin in #846
- [FSDP] Migrate FSDP CPU Offload from DeepSpeed to Native PyTorch FSDPv2 by @Hecate0821 in #847
- Provide Megatron/FSDP alignment script by @Zhuohao-Li in #788
- [FSDP]Fix ppo_kl by @Hecate0821 in #780
- support --debug-rollout-only for fsdp by @zhuzilin in #853
- Add chunked_gae and ppo ci test by @lilei199908 in #850
- Support IPV6 env by @Chen-GX in #842
- Turn SpecInfo into dict when serializing Sample by @zhuzilin in #855
- Tau bench by @maocheng23 in #362
- [FSDP] Delete legacy full param update weight by @Hecate0821 in #852
- Add script for convert k2-thinking int4 weight to bf16 by @Gao016 in #849
- Fix ipv4 dist_init_addr by @zhuzilin in #856
- fix bug in previous pr by @zhuzilin in #858
- Add kimi-k2-thinking @ BF16-train + FP8-rollout by @Gao016 in #857
- fix rollout logp broadcast stuck error in ppo training by @yitianlian in #862
- [FSDP] only use move mode for fsdp backend and refactor weight updation by @zhuzilin in #861
- Make router config and data padding configurable, improve FSDP actor code structure by @lancerts in #851
- [FSDP][Bug] Fix max_tokens_per_gpu in CP by @Hecate0821 in #866
- Bump torch_memory_saver to reduce host memory consumption by @fzyzcjy in #843
- Refactor Retool recipe with
rollout_log_probsrecorded by @Zhuohao-Li in #828 - Small cleanup router.py and fix of math_utils.py by @lancerts in #772
- Super tiny update and fix logs by @fzyzcjy in #844
- Super tiny remove unused code by @fzyzcjy in #860
- Extract and split weight update logic by @fzyzcjy in #873
- Support passing arbitrary sglang router arguments by @fzyzcjy in #874
- Super tiny update core dump path by @fzyzcjy in #875
- Refactor and cleanup update weight from tensor by @fzyzcjy in #876
- Tiny speedup qwen 30B example by changing backend by @fzyzcjy in #877
- Tiny speedup deepseek script model load and add auto copy by @fzyzcjy in #878
- Fix host out-of-memory after checkpointing by @fzyzcjy in #879
- Tiny split update_converted_params_from_tensor by @fzyzcjy in #880
- Super tiny bump sgl-kernel and docker image version by @fzyzcjy in #881
- Super tiny record save model time by @fzyzcjy in #882
- Add GLM demo on blackwell hardwares by @fzyzcjy in #883
- Fix pre-commit run --all-files by @lancerts in #870
- Refactor megatron-to-hf logic by @fzyzcjy in #884
- Tiny refactor padding remover by @fzyzcjy in #885
- Refactor and extract hf weight iterator by @fzyzcjy in #886
- Super tiny add memray as dev dependency by @fzyzcjy in #887
- Tiny try changing router config to avoid wasted time per step by @fzyzcjy in #888
- [FSDP] change rollout log probs to bf16 by @zhuzilin in #892
- Support directly loading HuggingFace checkpoints for Megatron backend by @fzyzcjy in #889
- Tiny generalize remove_padding weight name matcher by @fzyzcjy in #890
- Super tiny rename named params for clearer naming by @fzyzcjy in #891
- Tiny extract hf weight iterator base by @fzyzcjy in #893
- [script] Add run-qwen3-next-80B-A3B.sh by @zhuzilin in #897
- Support using Megatron Bridge to convert megatron to hugging face weights by @fzyzcjy in #894
- Tiny add megatron bridge option for several scripts by @fzyzcjy in #895
- Switch to use megatron bridge for two directions in e2e tests by @fzyzcjy in #896
- Fix megatron bridge support on PP and EP by @fzyzcjy in #898
- Super tiny update gitignore for macos by @fzyzcjy in #899
- Super tiny add back missing wandb dependency by @fzyzcjy in #900
- Super tiny delete unused file by @fzyzcjy in #901
- fix: dist.destroy_process_group by @Daucloud in #910
- fix: replace _is_hf_checkpoint with _is_megatron_checkpoint by @Daucloud in #908
- tiny fix dockerfile slime link error by @lilei199908 in #902
- [FSDP] convert from autocast to mixed_policy by @zhuzilin in #911
- [FSDP] Fix ref model compute log bug by @Hecate0821 in #914
- [FSDP][1/N] support true_on_policy training for FSDP2 by @zhuzilin in #917
- [docs] update FP8 training README.md by @xieck13 in #913
- [FSDP][2/N] support true_on_policy training for FSDP2 by @zhuzilin in #906
- Super tiny fix main lint error by @fzyzcjy in #931
- Super tiny allow hf_validate_args give multiple errors at once by @fzyzcjy in #920
- Tiny fix deepseek ckpt precision and extract fp8_cast_bf16 calling script by @fzyzcjy in #921
- Super tiny update GLM script task and mem usage by @fzyzcjy in #922
- Tiny unify 4b fsdp and default script by @fzyzcjy in #923
- Super tiny show body when HTTP response errors for debugging by @fzyzcjy in #924
- Super tiny add miles_plugins.megatron_bridge by @fzyzcjy in #925
- Super tiny avoid memory margin cause error when debug-rollout-only by @fzyzcjy in #926
- Support checking SGLang weight update correctness by @fzyzcjy in #927
- Fix reloadable process group errors when using 1 gpu by @fzyzcjy in #928
- Try to fix racing conditions in update weight from tensors by @fzyzcjy in #929
- Tiny change Dockerfile for building on 1TB memory by @fzyzcjy in #930
- Add logger.warning for destroy_process_groups(). by @Daucloud in #918
- Fix fail to pick config for bool values like use_tis by @fzyzcjy in #933
- [FSDP][3/N] support true on policy training for FSDP2 by @zhuzilin in #934
- add moonlight test by @lilei199908 in #935
- [docker] update dockerfile for megatron bridge by @zhuzilin in #936
- [docker] add new stable patch for sglang v0.5.5.post1 by @zhuzilin in #937
- [docker] install modelopt for megatron-bridge by @zhuzilin in #939
- [scripts] remove --sglang-disable-radix-cache in scripts by @zhuzilin in #940
- True onpolicy ci by @lilei199908 in #938
- Pad random experts for rollout routing replay by @zhuzilin in #941
- [Fix] fix rollout logp bug in mgt backend by @yitianlian in #942
- [Fix] Support FSDP training without rollout logp by @yitianlian in #945
- Tiny add initial GB200, TIS, fp8 rollout, fp8 train draft to demo scripts by @fzyzcjy in https://github.com/THUDM/slime/pull/948
- Add GB200 Docker image by @fzyzcjy in https://github.com/THUDM/slime/pull/946
- Tiny fix arg parser not printing logs by @fzyzcjy in https://github.com/THUDM/slime/pull/947
- [doc] move fp8 doc to qwen3-30B-a3B as qwen3-4B doesn't perform good on fp8 rollout by @zhuzilin in https://github.com/THUDM/slime/pull/952
- Add asystem-amem to offload nccl process group in sglang by @zhuzilin in https://github.com/THUDM/slime/pull/955
- [release] bump to v0.2.0 by @zhuzilin in https://github.com/THUDM/slime/pull/943
New Contributors
- @MrAta made their first contribution in #248
- @YuchenFan48 made their first contribution in #252
- @Gao016 made their first contribution in #253
- @FrankLeeeee made their first contribution in #262
- @richardodliu made their first contribution in #265
- @yinpeisu made their first contribution in #269
- @SanftMonster made their first contribution in #308
- @Arist12 made their first contribution in #306
- @Zhuohao-Li made their first contribution in #325
- @souhil25 made their first contribution in #303
- @rbao2018 made their first contribution in #328
- @oraluben made their first contribution in #337
- @GGGGGGXY made their first contribution in #336
- @PopSoda2002 made their first contribution in #304
- @ArtificialZeng made their first contribution in #343
- @ppraneth made their first contribution in #335
- @luppx made their first contribution in #330
- @JustinTong0323 made their first contribution in #356
- @sam571128 made their first contribution in #360
- @Williamren97 made their first contribution in #380
- @yefei12 made their first contribution in #382
- @hyleepp made their first contribution in #375
- @lancerts made their first contribution in #407
- @WWWjiahui made their first contribution in #410
- @MrWhitezz made their first contribution in #390
- @Guido1Alessandro1Trevisan made their first contribution in #413
- @none0663 made their first contribution in #420
- @coding-famer made their first contribution in #435
- @mmy360 made their first contribution in #524
- @Lez-3f made their first contribution in #519
- @Hecate0821 made their first contribution in #536
- @ryang-max made their first contribution in #354
- @yuzhu-cai made their first contribution in #548
- @foreverpiano made their first contribution in #542
- @tyler-romero made their first contribution in #520
- @yueming-yuan made their first contribution in #657
- @ChangyiYang made their first contribution in #688
- @ahxt made their first contribution in #673
- @ad8e made their first contribution in #763
- @GavinZhu-GMI made their first contribution in #792
- @JC-Chen1 made their first contribution in #812
- @xieck13 made their first contribution in #821
- @Daucloud made their first contribution in #910
Full Changelog: v0.1.0...v0.2.0