v0.1.0: Initial Release

puyuan1996 released this 31 Dec 04:50

· 44 commits to main since this release

16c8947

This is the initial release of LightRFT, a light, efficient, omni-modal & reward-model driven reinforcement fine-tuning framework.

🧠 Rich Algorithm Ecosystem

Implemented PPO and GRPO algorithms for Large Language Models.
Added comprehensive interfaces for Reward Model training and inference.

🎯 Innovative Resource Collaboration

Introduced "Colocate Anything" strategy to maximize GPU memory efficiency by colocating Actor, Critic, and Reward models.

🔧 Flexible Training Strategies

Integrated DeepSpeed ZeRO and FSDP for scalable distributed training.
Added PEFT (LoRA) integration for lightweight fine-tuning.

🌐 Environments & Models

Added support for GSM8K (Math reasoning) and Geo3K (Multimodal) environments and datasets.
Enabled support for Qwen and DeepSeek model families.

📚 Documentation & Toolkit

Integrated Weights & Biases (W&B) for training metric logging.
Released initial Quick Start guide, architecture overview, and reproduction scripts.

Full Changelog: https://github.com/opendilab/LightRFT/commits/v0.1.0

Contributors: OpenDILab, System Platform Center and Safe and Trustworthy AI Center at Shanghai AI Laboratory.

Assets 4