🚀 Feature Description
Faster training without schedules
https://github.com/facebookresearch/schedule_free
Solution
integrate optimizer option like AdamWScheduleFree
see https://github.com/facebookresearch/schedule_free
https://arxiv.org/pdf/2405.15682
Additional context
Potentially faster convergence, for future pretrains and finetunes. I have no data to share, didn't try it myself.