Skip to content

v0.2.0

Choose a tag to compare

@zhuzilin zhuzilin released this 28 Nov 02:51
· 417 commits to main since this release
91acef0

We are thrilled to announce the release of slime v0.2.0! Thanks to the incredible support and contributions from our community, slime has gained significant features and substantial performance enhancements in this version.

Major Updates

  • FSDP Backend: Introduced a fully Fully Sharded Data Parallel (FSDP) based training backend for improved scalability.
  • PPO Support: Added native support for Proximal Policy Optimization (PPO).
  • MTP Training: Enabled training of the MTP (Multi-Token Prediction) during Reinforcement Learning.
  • FP8 Full Stack: Support for both FP8 training and FP8 inference.
  • Train-Inference Mismatch: Alleviate or even eliminate train-inference mismatch
    • Importance Sampling: Custom interface for train-infer importance sampling (e.g., MIS).
    • Routing Replay: Added Rollout Routing Replay (R3) and Routing Replay (R2).
    • True On-Policy Training: Enabled strictly on-policy training with dense models on the FSDP backend.
  • Performance Improvements
    • Memory Optimization: CUDA Graphs offload, asystem-amem integration.
    • Faster Weight Updates: Significantly accelerated FP8 weight updates.
  • Python-based Router: A new slime router implemented in pure Python for accessibility.
  • Fault Tolerance: Added robustness with fault tolerance for the rollout engines.
  • Custom Configs: Support for passing customized configurations via --config.
  • [Experimental] Checkpoint Loading: Added support for Megatron-bridge based checkpoint loading.
  • New Examples
    • Fully Async Training
    • Multi-Agent Scenarios
    • On-Policy Distillation
    • Retool

What's Changed

New Contributors

Full Changelog: v0.1.0...v0.2.0