v0.1.0: Initial Release
This is the initial release of LightRFT, a light, efficient, omni-modal & reward-model driven reinforcement fine-tuning framework.
🧠 Rich Algorithm Ecosystem
- Implemented PPO and GRPO algorithms for Large Language Models.
- Added comprehensive interfaces for Reward Model training and inference.
🎯 Innovative Resource Collaboration
- Introduced "Colocate Anything" strategy to maximize GPU memory efficiency by colocating Actor, Critic, and Reward models.
🔧 Flexible Training Strategies
- Integrated DeepSpeed ZeRO and FSDP for scalable distributed training.
- Added PEFT (LoRA) integration for lightweight fine-tuning.
🌐 Environments & Models
- Added support for GSM8K (Math reasoning) and Geo3K (Multimodal) environments and datasets.
- Enabled support for Qwen and DeepSeek model families.
📚 Documentation & Toolkit
- Integrated Weights & Biases (W&B) for training metric logging.
- Released initial Quick Start guide, architecture overview, and reproduction scripts.
Full Changelog: https://github.com/opendilab/LightRFT/commits/v0.1.0
Contributors: OpenDILab, System Platform Center and Safe and Trustworthy AI Center at Shanghai AI Laboratory.