Skip to content

v0.1.0: Initial Release

Choose a tag to compare

@puyuan1996 puyuan1996 released this 31 Dec 04:50
· 44 commits to main since this release

This is the initial release of LightRFT, a light, efficient, omni-modal & reward-model driven reinforcement fine-tuning framework.

🧠 Rich Algorithm Ecosystem

  • Implemented PPO and GRPO algorithms for Large Language Models.
  • Added comprehensive interfaces for Reward Model training and inference.

🎯 Innovative Resource Collaboration

  • Introduced "Colocate Anything" strategy to maximize GPU memory efficiency by colocating Actor, Critic, and Reward models.

🔧 Flexible Training Strategies

  • Integrated DeepSpeed ZeRO and FSDP for scalable distributed training.
  • Added PEFT (LoRA) integration for lightweight fine-tuning.

🌐 Environments & Models

  • Added support for GSM8K (Math reasoning) and Geo3K (Multimodal) environments and datasets.
  • Enabled support for Qwen and DeepSeek model families.

📚 Documentation & Toolkit

  • Integrated Weights & Biases (W&B) for training metric logging.
  • Released initial Quick Start guide, architecture overview, and reproduction scripts.

Full Changelog: https://github.com/opendilab/LightRFT/commits/v0.1.0

Contributors: OpenDILab, System Platform Center and Safe and Trustworthy AI Center at Shanghai AI Laboratory.