Release v0.2.0 · THUDM/slime

We are thrilled to announce the release of slime v0.2.0! Thanks to the incredible support and contributions from our community, slime has gained significant features and substantial performance enhancements in this version.

Major Updates

FSDP Backend: Introduced a fully Fully Sharded Data Parallel (FSDP) based training backend for improved scalability.
PPO Support: Added native support for Proximal Policy Optimization (PPO).
MTP Training: Enabled training of the MTP (Multi-Token Prediction) during Reinforcement Learning.
FP8 Full Stack: Support for both FP8 training and FP8 inference.
Train-Inference Mismatch: Alleviate or even eliminate train-inference mismatch
- Importance Sampling: Custom interface for train-infer importance sampling (e.g., MIS).
- Routing Replay: Added Rollout Routing Replay (R3) and Routing Replay (R2).
- True On-Policy Training: Enabled strictly on-policy training with dense models on the FSDP backend.
Performance Improvements
- Memory Optimization: CUDA Graphs offload, asystem-amem integration.
- Faster Weight Updates: Significantly accelerated FP8 weight updates.
Python-based Router: A new slime router implemented in pure Python for accessibility.
Fault Tolerance: Added robustness with fault tolerance for the rollout engines.
Custom Configs: Support for passing customized configurations via --config.
[Experimental] Checkpoint Loading: Added support for Megatron-bridge based checkpoint loading.
New Examples
- Fully Async Training
- Multi-Agent Scenarios
- On-Policy Distillation
- Retool

What's Changed

[Doc typo] Update amd_tutorial.md by @yushengsu-thu in #246
[bugfix] use fp32 for rollout_log_probs by @zhuzilin in #245
Complete the RayTrainGroup args string docs. by @MrAta in #248
Update speculative decoding doc and sglang patch by @guapisolo in #250
fix debug-rollout-only by @zyzshishui in #249
retool in one commit by @maocheng23 in #237
fix: modify the rotary-base of qwen-3b to 1000000 for consistency by @YuchenFan48 in #252
update logging and fix typo by @maocheng23 in #254
[bugfix] fix read data containing "tools" field by @Maybewuss in #255
Revert "[bugfix] fix read data containing "tools" field" by @zhuzilin in #256
add shell script for qwen3-32B task by @Gao016 in #253
docs: Fix custom interface documentation errors by @GeLee-Q in #251
[example] Add fully async example by @zhuzilin in #258
added sphinx-based documentation by @FrankLeeeee in #262
fixed build error for documentation by @FrankLeeeee in #263
[bugfix] Fix bugs on multi samples from one prompt (multi-agent) by @zhuzilin in #260
fixed sphinx configuration by @FrankLeeeee in #264
[bugfix] fix read data containing "tools" field by @Maybewuss in #259
add DeepWiki badge by @richardodliu in #265
[doc] add example doc to the website by @zhuzilin in #267
[doc] add blogs by @zhuzilin in #268
Update actor_group.py by @zlH518 in #266
[doc] prettify language convertion toggle by @zhuzilin in #270
[example] add an example for multi-agent rl by @yinpeisu in #269
[refactor] Add isort back and move global gloo to global util by @zhuzilin in #273
[refactor] remove over_sampling_filter and extract some functions by @zhuzilin in #278
[feat] init support for FSDP by @zhuzilin in #282
Chatbot entry for Sphinx style docs by @jhinpan in #284
Revert "Chatbot entry for Sphinx style docs" by @zhuzilin in #286
Remove get_rollout_data from actor_group by @MrAta in #285
Add docs logo by @jhinpan in #283
[Hardware] AMD Dockerfile update - support up to d4a7741 (Sep 6, 2025) by @yushengsu-thu in #307
[feat] init xtuner backend by @zhuzilin in #310
[docker] update to sglang 0.5.2rc2 by @zhuzilin in #313
Add model version attribute in each sample by @yitianlian in #271
[nfc] cleanup for weight_version by @zhuzilin in #314
Add raw reward metric in fdsp backend by @yitianlian in #315
fix: check args.save when save_interval is set. by @SanftMonster in #308
Fix comment for --load parameter in checkpoint configuration (Quick Start Doc) by @Arist12 in #306
[refactor] bind numa and rename num_gpus_per_node by @zhuzilin in #316
[xtuner] unroll TrainingWorker and TrainEngine by @zhuzilin in #322
[xtuner] add wandb by @zhuzilin in #324
[bugfix] fix no weight_version for aborted samples by @zhuzilin in #327
[FSDP] Verify FSDP backend availability via uv install / pip install by @Zhuohao-Li in #325
Add FSDP extras dependency and import test (#302) by @souhil25 in #303
fix: small bug fix in the rollout_buffer_example.sh by @rbao2018 in #328
[refactor] remove slime/backend/utils and extract slime_validate_args by @zhuzilin in #329
feat: auto configure megatron from hf config. by @SanftMonster in #312
Do not read ip if env is provided by @oraluben in #337
[rm_hub] fix ground_truth type error in grade_answer_verl by @GGGGGGXY in #336
[feat] use one global httpx.AsyncClient and remove --use-http2 by @zhuzilin in #338
[Refactor] Merge rollout controller into rollout manager by @PopSoda2002 in #304
add dockerfile and patch for b200 by @maocheng23 in #340
[feat] init support for PPO by @zhuzilin in #342
wrong expressions and typo by @ArtificialZeng in #343
Add basic VLM data pipeline by @ppraneth in #335
[FSDP] Add reference model support for correct KL loss computation #296 by @UbeCc in #344
fix incorrect sft loss mask for qwen3 thinking series models. by @luppx in #330
feature: ppo by @lilei199908 in #347
[FIX] NVLINK detection method in scripts by @JustinTong0323 in #356
fix lint by @JustinTong0323 in #358
[feat] add --critic-lr and --num-critic-only-steps by @zhuzilin in #350
[refactor] Add actor registry by @zhuzilin in #359
Added GB200 patches for SGLang v0.5.2 by @sam571128 in #360
[bugfix] fix the num_tokens used for per_token_loss in multi-turn training by @zhuzilin in #365
[Feature] Support token in token out for multi turn tasks by @yitianlian in #242
[router] support slime-router only by @zhuzilin in #366
[router] extract middleware folder by @zhuzilin in #367
[feat] support distributed post to enable more concurrent requests by @zhuzilin in #368
[FEAT] Deterministic rollout by @JustinTong0323 in #361
[reproducibility][docker] enable training reproducibility by @zhuzilin in #370
[feat] enable use_flattened_tensor_bucket with quantization config by @zhuzilin in #374
[fix] fix ppo bugs by @lilei199908 in #373
docs: add B200/H-series GPU hardware support information by @Williamren97 in #380
[model] fix run-qwen3-30B-A3B.sh by @yefei12 in #382
Enable loss mask for sft by @UbeCc in #377
[fix] fix paths in get_started.md by @hyleepp in #375
[FSDP] Data Packing Implementation in FSDP backend by @jhinpan in #321
[feat] add --use-routing-replay by @zhuzilin in #387
fix bug for convert Qwen3-235B-A22B HF model weight to Megatorn torch_dist format by @Gao016 in #386
[FSDP] Add update weight class from distributed by @PopSoda2002 in #341
[docker] fix routing replay with pp and other bugfixes by @zhuzilin in #395
Fix Pipeline Parallelism deadlock in distributed_masked_whiten by @Chen-GX in #393
Fix some bugs in slime router by @yitianlian in #389
[Fix] ppo rollout engins for distribute by @lilei199908 in #394
[refactor] remove Registry and change the order of init by @zhuzilin in #398
[FSDP][Xtuner] Delete ray registry by @PopSoda2002 in #400
[feat] support fault tolerant for rollout engines by @zhuzilin in #405
[Doc] Fix readme by @lancerts in #407
Include type annotation in update_weight_utils.py by @lancerts in #401
Add two related slime-based projects intro to readme by @jhinpan in #409
[FSDP] optimize: fix FSDP memory overhead by @Williamren97 in #357
[Doc] Replace Chinese chars to English by @lancerts in #408
[FSDP] fix serialization issue and add support for sharded mode flatten tensor by @WWWjiahui in #410
[FSDP][Feature] TIS implementation by @MrWhitezz in #390
[model] Support Qwen3-Next-80B-A3B by @zhuzilin in #417
Update train_async.py by @Guido1Alessandro1Trevisan in #413
[xtuner] remove xtuner backend by @zhuzilin in #424
[test] add tensorboard by @none0663 in #420
[doc] update readme zh by @lancerts in #422
[Doc] Include type annotations for sglang_rollout by @lancerts in #415
[Docs] Fix GRPO related docs by @zhaochenyang20 in #428
Adding tensorboard by @zhaochenyang20 in #427
[CI] Unify isort CI by adding thirdparty group by @zhaochenyang20 in #430
[docs] adding pre-commit guidance and unify EN/ZH docs by @zhaochenyang20 in #431
[Doc] Add type annotation in actor.py by @lancerts in #436
Super tiny fix typo by @fzyzcjy in #434
[Hardware] AMD - Replace vllm CuMemAllocator dependency with torch_memory_saver by @yushengsu-thu in #444
Super tiny remove unused distributed_args by @fzyzcjy in #439
Super tiny remove setting new field rollout_id by @fzyzcjy in #440
Fix inconsistent naming and add default tb_project_name in tensorboard_utils by @coding-famer in #435
[Feat]: add pipelined weight update by @GeLee-Q in #432
[Doc] Include the docs in update_weight by @lancerts in #433
fix Multimodal printing type error risk, return type annotation consistency, offload image processing to a background thread by @lancerts in #423
Tiny fix some test script errors by @fzyzcjy in #443
Fix FSDP error when DTensor device mesh is empty by @fzyzcjy in #448
[FSDP] fix oom on grad_norm and some code cleanup by @zhuzilin in #451
Fix hostname resolution when DNS/hosts missing by @yefei12 in #449
Upgrade slime router by @yitianlian in #418
Change trainer backend from environment variable to args by @fzyzcjy in #442
Fix async training error in last rollout by @fzyzcjy in #452
Allow checking accuracy correctness programmatically by @fzyzcjy in #453
Super tiny refactor the rollout address computation by @fzyzcjy in #455
Tiny refactor engine wrapper by @fzyzcjy in #456
Allow users provide their own rollout engines instead of launched by framework by @fzyzcjy in #457
Super tiny refactor wandb initialization by @fzyzcjy in #462
Super tiny add logs when kl checker fails by @fzyzcjy in #458
Allow wandb have SGLang engine metrics such as time vs generation throughput by @fzyzcjy in #463
Fix old actor update by @yitianlian in #461
Fix typo and optimize TensorBoard logging by @none0663 in #464
Tiny refactor and extract RolloutHealthMonitor by @fzyzcjy in #465
[docker] upgrade to sglang v0.5.3.post1 by @zhuzilin in #472
Avoid duplicated state between the rollout engines variables by @fzyzcjy in #466
Super tiny fix typo by @fzyzcjy in #469
[Doc] Include type annotation for cp_utils and model_provider by @lancerts in #468
Super tiny refactor branching in getting dataset samples by @fzyzcjy in #470
GSPO by @ppraneth in #454
[bugfix] initialize rollout manager first to calculate num_rollout by @zhuzilin in #473
Super tiny add Qwen3-1.7B megatron config by @fzyzcjy in #478
[Doc] Include type annotation in actor.py with minor fixes by @lancerts in #474
Tiny add Sample.group_index by @fzyzcjy in #475
Fix debug_rollout_only incompatible with colocate in gpu allocation by @fzyzcjy in #477
[ci] add --fp8-param-gather in ci for te 2.8.0 by @zhuzilin in #480
[ci] remove fp8 in ci by @zhuzilin in #481
Allow rollout fn to provide metrics and improve its extensibility by @fzyzcjy in #471
[Doc ]Include type annotation and doc string for data.py by @lancerts in #485
Update CI (part 1/N) by @fzyzcjy in #441
[docker] trim sglang.patch by @zhuzilin in #493
use SingletonMeta for _TensorboardAdapter by @zhuzilin in #494
[Feature] Add ref model update interval argument by @yitianlian in #490
Tiny refactor dynamic_filter for extensibility by @fzyzcjy in #482
[Doc ]Include type annotation and doc string for update_weight_utils.py by @lancerts in #484
[fault_tolerance] add --use-fault-tolerance and disable fault tolerance by default by @zhuzilin in #497
allow setting sglang_router_port and bugfix in sglang.patch by @zhuzilin in #503
[Doc] Include type annotation and docstring in loss.py by @lancerts in #498
[Doc] Update type annotation for slime/slime/backends/megatron_utils/update_weight_utils.py by @lancerts in #500
[Doc] Include type annotation and doc string for slime/slime/backends/megatron_utils/model.py by @lancerts in #499
Compute metrics for zero-variance rollout samples by @fzyzcjy in #479
Add metric of dynamic filter drop reasons by @fzyzcjy in #496
Super tiny code cleanup by @fzyzcjy in #486
fix: Correctly display sglang_tensor_parallel_size in startup logs by @Gao016 in #459
fix rollout dataset loading by @zhuzilin in #504
Super tiny refactor code for reuse by @fzyzcjy in #505
Super tiny fix typo by @fzyzcjy in #511
[Doc] Update fsdp_utils type annotation based on PEP guide by @lancerts in #509
Fix learning rate warmup parameters by @fzyzcjy in #516
Fix OOM in some SFT cases by @fzyzcjy in #517
Support custom argument parsing by yaml file by @guapisolo in #521
fix: prevent OOM when converting DeepSeek-V3 models by enabling memory-efficient loading by @mmy360 in #524
Fix patching by pinning megatron by @fzyzcjy in #535
Refactoring training inference importance sampling with seqeunce/geometry level by @zhaochenyang20 in #429
Tiny fix k1 estimator typo and low_var_kl comments by @fzyzcjy in #483
fix bug in fully async example (issue #488) by @Lez-3f in #519
Print example data in SFT by @fzyzcjy in #528
[FSDP] Optimizer CPU offload and other weight loading fix by @Hecate0821 in #536
Tiny cleanup unused function by @coding-famer in #539
fix load data label type by @rbao2018 in #538
Super tiny add actor_train_tok_per_s perf metric by @fzyzcjy in #531
[Feat] Support offload cuda graph by @ryang-max in #354
Super tiny fix doc by @fzyzcjy in #546
Tiny fix FlattenedTensorBucket import error for latest SGLang by @fzyzcjy in #553
[docker] trim megatron and sglang patch by @zhuzilin in #552
[docker] upgrade to megatron v0.14.0 by @zhuzilin in #554
[docker] install fa3 for better performance by @zhuzilin in #556
offload router replay indices and cleanup requirements.txt by @zhuzilin in #559
Fix: rollback pipeline update weights because fp8 rollout bug when rollout tp > 1 by @lilei199908 in #561
Fix _TensorboardAdapter Singleton mode by @coding-famer in #550
[ci] update doc ci by @zhuzilin in #562
Support dumping eval sample details by @fzyzcjy in #525
fix(data): correct length filtering from character to token level by @yuzhu-cai in #548
Fix logprob does not handle temperature by @fzyzcjy in #557
Support slicing datasets by @fzyzcjy in #526
Super tiny install rsync in container by @fzyzcjy in #529
Support pass@K metrics in eval and related refactors by @fzyzcjy in #532
Removing whitespace warning; Minor update by @foreverpiano in #542
Tiny remove unused code by @fzyzcjy in #558
Support evaluation-only runs by @fzyzcjy in #527
Report reward category statistics by @fzyzcjy in #533
Tiny update memory printing by @fzyzcjy in #563
Try to fix train.py wrong logic about saving checkpoints by @fzyzcjy in #564
Tiny fix fsdp distributed update weight error by @fzyzcjy in #567
Tiny extract and enhance oom dumper by @fzyzcjy in #568
Support true on policy by @fzyzcjy in #566
Pin torch_memory_saver version by @fzyzcjy in #570
Split args.offload to train and rollout by @fzyzcjy in #569
Bump torch_memory_saver to include the OOM fix by @fzyzcjy in #573
Tiny call clear cache when offloading rollout but not train by @fzyzcjy in #574
Tiny fix fsdp prob diff computation when tis is disabled by @fzyzcjy in #575
Fix offload_train rename in fsdp by @fzyzcjy in #576
Tiny refactor and print more in memory utils by @fzyzcjy in #577
Fix fsdp training sleep return too early causing OOM by @fzyzcjy in #578
Workaround for SGLang release_memory_occupation return too early causing OOM by @fzyzcjy in #579
Add Qwen3-4B demo with optional true on policy by @fzyzcjy in #580
Super tiny fix merge error by @fzyzcjy in #581
Add simple true on policy in 4B demo by @fzyzcjy in #582
Tiny enhance command_utils for experiment scripting by @fzyzcjy in #571
Support multi task evaluation by @zyzshishui in #585
[FSDP][ready for review] Reduce peak memory required when gathering log-probs by @tyler-romero in #520
fix omegaconfig import by @zhuzilin in #586
Super tiny remove duplicated dependency by @fzyzcjy in #587
Tiny enhance profiler and add memory-snapshot-num-steps by @fzyzcjy in #588
Fix latest SGLang not supported for true on policy mode by @fzyzcjy in #589
Support not offloading components in colocate mode by @fzyzcjy in #590
Fix FSDP oom error when SGLang increases memory usage by @fzyzcjy in #591
Enhance example scripts and command utils by @fzyzcjy in #592
Try to synchronize after FSDP wakeup by @fzyzcjy in #593
Tiny fix http post does not follow max retries by @fzyzcjy in #594
use Pydantic for multi task evaluation config by @zyzshishui in #595
Add abs to train_rollout_logprob_diff by @fzyzcjy in #596
Super tiny add multiple eval tasks in the 4B demo by @fzyzcjy in #598
Fix true on policy by conditionally disable softmax compile by @fzyzcjy in #599
Fix fault tolerance for inference engines which fails to work before by @fzyzcjy in #600
Super tiny print and fix by @fzyzcjy in #601
Minor cleanup train engine offload related by @fzyzcjy in #602
Compile rope while preserving true on policy by @fzyzcjy in #603
Feat/add rollout logporbs ratio by @lilei199908 in #605
[script] use flash attn backend for dpsk by @zhuzilin in #606
change rollout_num_gpus is not None to rollout_num_gpus for sft by @UbeCc in #613
support combined 1f1b by @zhuzilin in #565
Support offloading in FSDP backend based on moving tensors by @fzyzcjy in #607
Tiny update configs for qwen3 fsdp demo by @fzyzcjy in #608
Support apply chat template kwargs like non-thinking qwen by @fzyzcjy in #609
clear num_new_engines when fault tolerance is disabled by @zhuzilin in #622
Support OOD Eval Tasks (GPQA, IFBench) by @zyzshishui in #597
Fix releasing tensors too late causing high memory usage by @fzyzcjy in #616
Add train-env-vars for cases like pytorch memory env vars by @fzyzcjy in #614
Enable expandable segments to reduce memory reservation by @fzyzcjy in #615
Fix fsdp2 wrong usage causing tensors not correctly sharded by @fzyzcjy in #617
Super tiny script to display debug dumped rollout data by @fzyzcjy in #618
[model] use self attn in megatron for gated attn by @zhuzilin in #624
move pg_loss into tis_function for icepop by @zhuzilin in #635
Tiny change the 4B fsdp demo script by @fzyzcjy in #627
Tiny make IFBench lazily imported to avoid errors by @fzyzcjy in #628
Super tiny rename total_train_time to step_time for clarity by @fzyzcjy in #629
Support FSDP Checkpoint Saving & Loading by @zyzshishui in #633
Super tiny fix lint by @fzyzcjy in #626
[Doc] Fix spec decode doc by @ryang-max in #623
Add kimi-k2 by @Gao016 in #560
Small fix on --num-rollout and --num-epoch by @Zhuohao-Li in #620
Tiny extract training profiler by @fzyzcjy in #631
fix typo in scripts by @zyzshishui in #634
fix typo by @zhuzilin in #636
bugfix by @zhuzilin in #637
Refactor and extract training perf metrics by @fzyzcjy in #630
Refactor and add timers in training by @fzyzcjy in #632
Enhance the 4B FSDP script and use Typer by @fzyzcjy in #643
Add converted performance metrics in FSDP backend by @fzyzcjy in #645
Add repetition check function by @zhuzilin in #652
Support MTP training by @guapisolo in #640
[docker] fix mtp-rl training and upgrade to sglang 0.5.4.post1 by @zhuzilin in #655
Fix FSDP Reference Model Loading by @zyzshishui in #656
Add profiler for FSDP by @fzyzcjy in #646
Tiny add load_debug_rollout_data_subsample by @fzyzcjy in #647
Fix fsdp cannot do computation communication overlap by @fzyzcjy in #648
Tiny add response length metrics by @fzyzcjy in #649
Tiny improve 4b fsdp script such as fixing rm_type by @fzyzcjy in #650
Add moonlight-16B-A3b running script by @Gao016 in #653
[FSDP] Remove redundant GPU memory restore and improve code style by @Hecate0821 in #658
Extract and support saving debug train data for FSDP by @fzyzcjy in #659
Support profiling single training forward-backward and logprob-computation step by @fzyzcjy in #660
Decouple IS Weights from Rejection Sampling in MIS by @yueming-yuan in #657
[FSDP] Add more eval metrics to align Megatron by @Zhuohao-Li in #671
Fix code incompatible with Megatron with less patch by @fzyzcjy in #668
set weights_only=False for load_debug_rollout_data by @zhuzilin in #680
Fix Mimo-7B-RL special mtp structure by @guapisolo in #691
Revert "Fix Mimo-7B-RL special mtp structure" by @zhuzilin in #692
Add kimi-k2-instruct running script by @Gao016 in #694
Add Search-R1 example for Importance Sampling by @ChangyiYang in #688
fix bug on gspo by @zhuzilin in #695
Tiny fix megatron load from checkpoint error by @fzyzcjy in #662
Add script to send prompts to SGLang with compatible arguments by @fzyzcjy in #663
[docker] update sglang patch by @zhuzilin in #704
[Revert revert] Fix Mimo-7B-RL special mtp structure by @guapisolo in #707
support fp16 training by @zhuzilin in #708
detach lm_head when training mtp by @zhuzilin in #709
[on-policy distillation] support and related data handling by @ahxt in #673
Format code and fix missing arguments by @zyzshishui in #710
Doc of true on policy done 3 week ago by @fzyzcjy in #711
Super tiny make multi eval disabled by default for quick experiments by @fzyzcjy in #712
Support rollout routing replay by @zhuzilin in #715
Add script to recompute metrics for an existing run by @fzyzcjy in #664
Super tiny add assertion for non existing feature in fsdp by @fzyzcjy in #665
[FSDP] Migrate FSDP Checkpointing to PyTorch Distributed Checkpoint by @Hecate0821 in #677
Refactor metrics computation to add more metrics to eval by @fzyzcjy in #666
Fix NCCL out-of-memory error even when there is memory by @fzyzcjy in #669
Print memory info when NCCL errors by @fzyzcjy in #670
Update doc by @fzyzcjy in #716
Doc by @zyzshishui in #717
Update doc by @fzyzcjy in #723
Update MTP training doc by @guapisolo in #718
Tiny remove non-effective tis to avoid confusion by @fzyzcjy in #713
Tiny add repetition and truncation metrics by @fzyzcjy in #719
Tiny fix compatibility with older SGLang by @fzyzcjy in #720
Tiny add utility to debug reward functions by @fzyzcjy in #721
Enable mm fallback variant in SGLang by @fzyzcjy in #726
Super tiny add Qwen3-4B-Instruct-2507 model config by @fzyzcjy in #661
Tiny fix errors by @fzyzcjy in #728
Lock versions for Dockerfile by @fzyzcjy in #727
Super tiny remove unused docker-related files by @fzyzcjy in #731
Add formal mathematics example with RL and SFT by @fzyzcjy in #733
Add tests for external rollout feature by @fzyzcjy in #734
Allow multiple GitHub action runners to compete for the same GPUs by @fzyzcjy in #735
Tiny update command utils by @fzyzcjy in #737
Super tiny fix typo by @fzyzcjy in #738
Tiny let wandb record some environment variables by @fzyzcjy in #739
Tiny fix dataclass_cli does not support multiple typer commands by @fzyzcjy in #740
Fix mtp rl detach by @zhuzilin in #746
Tiny support parsing slurm num nodes and extra env vars by @fzyzcjy in #741
Refactor memory profiler by @fzyzcjy in #742
Tiny enhance exec_command by @fzyzcjy in #743
Tiny enhance http call error logging by @fzyzcjy in #744
Super tiny add truncated-layers deepseek model config by @fzyzcjy in #745
Support megatron backend in 4b demo script by @fzyzcjy in #667
Update 4B training script and fix oom issue by @fzyzcjy in #747
Support memray for host memory profiling by @fzyzcjy in #748
Support more cases for megatron in 4b demo script by @fzyzcjy in #749
Super tiny fix missing code by @fzyzcjy in #750
[doc] update spec decoding doc by @zhuzilin in #752
Fix MTP loss mask intersection by @guapisolo in #751
[FSDP] Optimize weight update in distributed mode by @Hecate0821 in #729
Refactor and simplify Dockerfile by @fzyzcjy in #754
Super tiny ping SGLang version in Dockerfile for reproducibility by @fzyzcjy in #755
Tiny add docker build and upload script with locked version by @fzyzcjy in #756
Support ue8m0 quantization from training to inference by @fzyzcjy in #758
Fix errors not raised in http calls by @fzyzcjy in #759
Fix error when having local http proxy by @fzyzcjy in #757
Tiny remove unused file by @fzyzcjy in #753
Super tiny rename rocm dockerfile by @fzyzcjy in #760
fix sglang compatiblity by @zhuzilin in #761
rearrange import by @zhuzilin in #762
Refactor weight updater to avoid depending on weight backup dict by @fzyzcjy in #764
Refactor CI and e2e tests by @fzyzcjy in #736
Tiny refactor train script by @fzyzcjy in #765
Super tiny add onload wrapper API by @fzyzcjy in #766
simplify get_model by @zhuzilin in #768
better distributed init, also supports slurm by @ad8e in #763
Fix typos and cleanup prints that are commented out by @lancerts in #725
Fix sglang offload for qwen3 next by @zhuzilin in #769
Fix cp for qwen3 next by @zhuzilin in #770
Add RLVE in slime projects by @zhaochenyang20 in #773
[Feature] Tiny fix for wandb run id by @yitianlian in #730
[FSDP] Support Context parallelism for FSDP using ring-flash-attn by @PopSoda2002 in #467
Super tiny fix code fmt by @fzyzcjy in #775
Refactor and add TensorBackuper abstraction by @fzyzcjy in #771
Support saving memory by disabling tensor backuper by @fzyzcjy in #776
add ring_flash_attn to requirements.txt by @zhuzilin in #778
Tiny fix model config error when sourcing files by @fzyzcjy in #784
Tiny reduce peak memory usage in weight update by @fzyzcjy in #785
Super tiny print more info when OOM by @fzyzcjy in #786
Add GB300 docker image by @fzyzcjy in #787
remove warning for fsdp backend by @zhuzilin in #789
Super tiny update custom github runner by @fzyzcjy in #790
Super tiny add sample scripts including GB300 demo by @fzyzcjy in #791
update pics in true-on-policy doc by @zyzshishui in #783
Support multi dtypes in FlattenedTensorBucket by @zhuzilin in #793
Fix: make raw_reward optional in process_rollout_data by @GavinZhu-GMI in #792
Apply megatron commit d8c6aa4c to fix Blackwell missing saving checkpoint by @fzyzcjy in #794
Super tiny further refactor wandb_run_id by @fzyzcjy in #796
Super tiny configure logger by @fzyzcjy in #797
Tiny change printing to logging by @fzyzcjy in #798
Fix SGLang Router Endpoint by @sam571128 in #683
Add backward compatiblility for sgl-router by @zhuzilin in #802
Super tiny add comments for train_async by @fzyzcjy in #800
Tiny unify wandb and tensorboard code by @fzyzcjy in #801
Fix git failure even when the patch is correct by @fzyzcjy in #799
Super tiny refactor wandb step computation by @fzyzcjy in #803
fix backward compatibility for slime router by @zhuzilin in #805
Super tiny loosen sglang-router requirements by @fzyzcjy in #804
[FSDP] fix the rollout/raw_reward metrics calculation by @Zhuohao-Li in #806
More TIS features; skip recompute; mismatch metrics without TIS by @yueming-yuan in #690
Super tiny fix duplicated arg by @fzyzcjy in #809
support rollout routing replay for model with dense layer by @zhuzilin in #810
fix bug on abort_request by @zhuzilin in #811
Add P1 in slime projects by @JC-Chen1 in #812
Tiny fix worker abort request bug by @fzyzcjy in #814
Add draft example for DeepSeek by @fzyzcjy in #816
Super tiny reduce Megatron logging verbosity and fix log message by @fzyzcjy in #817
Unify dockerfile for multiple hardwares by @fzyzcjy in #815
Super tiny delete outdated file by @fzyzcjy in #820
Super tiny add SGLANG_ENABLE_HEALTH_ENDPOINT_GENERATION flag by @fzyzcjy in #823
Refactor command_utils and scripts by @fzyzcjy in #824
[FSDP][BugFix] Avoid autocast for computing log prob for true on policy by @PopSoda2002 in #833
fix ppo bugs by @lilei199908 in #835
Super tiny fix readme by @fzyzcjy in #836
pad token to tp_size * 128 to enable tp_size by @zhuzilin in #838
Update qwen3-30b-a3b script by @fzyzcjy in #825
Super tiny remove unused script variable by @fzyzcjy in #831
Fix multi node nccl slowness in grace blackwell by @fzyzcjy in #839
Tiny refactor checkpoint conversion script by @fzyzcjy in #840
Pre-install main package inside container by @fzyzcjy in #841
add fp8 training examples by @xieck13 in #821
add FP8 training and inference script for Qwen3-30B-A3B model by @yefei12 in #845
fix ppo stuck when use_rollout_logprobs and fix typo by @zhuzilin in #846
[FSDP] Migrate FSDP CPU Offload from DeepSpeed to Native PyTorch FSDPv2 by @Hecate0821 in #847
Provide Megatron/FSDP alignment script by @Zhuohao-Li in #788
[FSDP]Fix ppo_kl by @Hecate0821 in #780
support --debug-rollout-only for fsdp by @zhuzilin in #853
Add chunked_gae and ppo ci test by @lilei199908 in #850
Support IPV6 env by @Chen-GX in #842
Turn SpecInfo into dict when serializing Sample by @zhuzilin in #855
Tau bench by @maocheng23 in #362
[FSDP] Delete legacy full param update weight by @Hecate0821 in #852
Add script for convert k2-thinking int4 weight to bf16 by @Gao016 in #849
Fix ipv4 dist_init_addr by @zhuzilin in #856
fix bug in previous pr by @zhuzilin in #858
Add kimi-k2-thinking @ BF16-train + FP8-rollout by @Gao016 in #857
fix rollout logp broadcast stuck error in ppo training by @yitianlian in #862
[FSDP] only use move mode for fsdp backend and refactor weight updation by @zhuzilin in #861
Make router config and data padding configurable, improve FSDP actor code structure by @lancerts in #851
[FSDP][Bug] Fix max_tokens_per_gpu in CP by @Hecate0821 in #866
Bump torch_memory_saver to reduce host memory consumption by @fzyzcjy in #843
Refactor Retool recipe with rollout_log_probs recorded by @Zhuohao-Li in #828
Small cleanup router.py and fix of math_utils.py by @lancerts in #772
Super tiny update and fix logs by @fzyzcjy in #844
Super tiny remove unused code by @fzyzcjy in #860
Extract and split weight update logic by @fzyzcjy in #873
Support passing arbitrary sglang router arguments by @fzyzcjy in #874
Super tiny update core dump path by @fzyzcjy in #875
Refactor and cleanup update weight from tensor by @fzyzcjy in #876
Tiny speedup qwen 30B example by changing backend by @fzyzcjy in #877
Tiny speedup deepseek script model load and add auto copy by @fzyzcjy in #878
Fix host out-of-memory after checkpointing by @fzyzcjy in #879
Tiny split update_converted_params_from_tensor by @fzyzcjy in #880
Super tiny bump sgl-kernel and docker image version by @fzyzcjy in #881
Super tiny record save model time by @fzyzcjy in #882
Add GLM demo on blackwell hardwares by @fzyzcjy in #883
Fix pre-commit run --all-files by @lancerts in #870
Refactor megatron-to-hf logic by @fzyzcjy in #884
Tiny refactor padding remover by @fzyzcjy in #885
Refactor and extract hf weight iterator by @fzyzcjy in #886
Super tiny add memray as dev dependency by @fzyzcjy in #887
Tiny try changing router config to avoid wasted time per step by @fzyzcjy in #888
[FSDP] change rollout log probs to bf16 by @zhuzilin in #892
Support directly loading HuggingFace checkpoints for Megatron backend by @fzyzcjy in #889
Tiny generalize remove_padding weight name matcher by @fzyzcjy in #890
Super tiny rename named params for clearer naming by @fzyzcjy in #891
Tiny extract hf weight iterator base by @fzyzcjy in #893
[script] Add run-qwen3-next-80B-A3B.sh by @zhuzilin in #897
Support using Megatron Bridge to convert megatron to hugging face weights by @fzyzcjy in #894
Tiny add megatron bridge option for several scripts by @fzyzcjy in #895
Switch to use megatron bridge for two directions in e2e tests by @fzyzcjy in #896
Fix megatron bridge support on PP and EP by @fzyzcjy in #898
Super tiny update gitignore for macos by @fzyzcjy in #899
Super tiny add back missing wandb dependency by @fzyzcjy in #900
Super tiny delete unused file by @fzyzcjy in #901
fix: dist.destroy_process_group by @Daucloud in #910
fix: replace _is_hf_checkpoint with _is_megatron_checkpoint by @Daucloud in #908
tiny fix dockerfile slime link error by @lilei199908 in #902
[FSDP] convert from autocast to mixed_policy by @zhuzilin in #911
[FSDP] Fix ref model compute log bug by @Hecate0821 in #914
[FSDP][1/N] support true_on_policy training for FSDP2 by @zhuzilin in #917
[docs] update FP8 training README.md by @xieck13 in #913
[FSDP][2/N] support true_on_policy training for FSDP2 by @zhuzilin in #906
Super tiny fix main lint error by @fzyzcjy in #931
Super tiny allow hf_validate_args give multiple errors at once by @fzyzcjy in #920
Tiny fix deepseek ckpt precision and extract fp8_cast_bf16 calling script by @fzyzcjy in #921
Super tiny update GLM script task and mem usage by @fzyzcjy in #922
Tiny unify 4b fsdp and default script by @fzyzcjy in #923
Super tiny show body when HTTP response errors for debugging by @fzyzcjy in #924
Super tiny add miles_plugins.megatron_bridge by @fzyzcjy in #925
Super tiny avoid memory margin cause error when debug-rollout-only by @fzyzcjy in #926
Support checking SGLang weight update correctness by @fzyzcjy in #927
Fix reloadable process group errors when using 1 gpu by @fzyzcjy in #928
Try to fix racing conditions in update weight from tensors by @fzyzcjy in #929
Tiny change Dockerfile for building on 1TB memory by @fzyzcjy in #930
Add logger.warning for destroy_process_groups(). by @Daucloud in #918
Fix fail to pick config for bool values like use_tis by @fzyzcjy in #933
[FSDP][3/N] support true on policy training for FSDP2 by @zhuzilin in #934
add moonlight test by @lilei199908 in #935
[docker] update dockerfile for megatron bridge by @zhuzilin in #936
[docker] add new stable patch for sglang v0.5.5.post1 by @zhuzilin in #937
[docker] install modelopt for megatron-bridge by @zhuzilin in #939
[scripts] remove --sglang-disable-radix-cache in scripts by @zhuzilin in #940
True onpolicy ci by @lilei199908 in #938
Pad random experts for rollout routing replay by @zhuzilin in #941
[Fix] fix rollout logp bug in mgt backend by @yitianlian in #942
[Fix] Support FSDP training without rollout logp by @yitianlian in #945
Tiny add initial GB200, TIS, fp8 rollout, fp8 train draft to demo scripts by @fzyzcjy in https://github.com/THUDM/slime/pull/948
Add GB200 Docker image by @fzyzcjy in https://github.com/THUDM/slime/pull/946
Tiny fix arg parser not printing logs by @fzyzcjy in https://github.com/THUDM/slime/pull/947
[doc] move fp8 doc to qwen3-30B-a3B as qwen3-4B doesn't perform good on fp8 rollout by @zhuzilin in https://github.com/THUDM/slime/pull/952
Add asystem-amem to offload nccl process group in sglang by @zhuzilin in https://github.com/THUDM/slime/pull/955
[release] bump to v0.2.0 by @zhuzilin in https://github.com/THUDM/slime/pull/943

New Contributors

@MrAta made their first contribution in #248
@YuchenFan48 made their first contribution in #252
@Gao016 made their first contribution in #253
@FrankLeeeee made their first contribution in #262
@richardodliu made their first contribution in #265
@yinpeisu made their first contribution in #269
@SanftMonster made their first contribution in #308
@Arist12 made their first contribution in #306
@Zhuohao-Li made their first contribution in #325
@souhil25 made their first contribution in #303
@rbao2018 made their first contribution in #328
@oraluben made their first contribution in #337
@GGGGGGXY made their first contribution in #336
@PopSoda2002 made their first contribution in #304
@ArtificialZeng made their first contribution in #343
@ppraneth made their first contribution in #335
@luppx made their first contribution in #330
@JustinTong0323 made their first contribution in #356
@sam571128 made their first contribution in #360
@Williamren97 made their first contribution in #380
@yefei12 made their first contribution in #382
@hyleepp made their first contribution in #375
@lancerts made their first contribution in #407
@WWWjiahui made their first contribution in #410
@MrWhitezz made their first contribution in #390
@Guido1Alessandro1Trevisan made their first contribution in #413
@none0663 made their first contribution in #420
@coding-famer made their first contribution in #435
@mmy360 made their first contribution in #524
@Lez-3f made their first contribution in #519
@Hecate0821 made their first contribution in #536
@ryang-max made their first contribution in #354
@yuzhu-cai made their first contribution in #548
@foreverpiano made their first contribution in #542
@tyler-romero made their first contribution in #520
@yueming-yuan made their first contribution in #657
@ChangyiYang made their first contribution in #688
@ahxt made their first contribution in #673
@ad8e made their first contribution in #763
@GavinZhu-GMI made their first contribution in #792
@JC-Chen1 made their first contribution in #812
@xieck13 made their first contribution in #821
@Daucloud made their first contribution in #910

Full Changelog: v0.1.0...v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Major Updates

What's Changed

New Contributors

Contributors

Uh oh!