adding support for LR schedule for full distributed finetune #2263
Open
Description
My understanding is that the full multi-gpu fine-tuning doesn't yet support learning rate schedules.
Would it be possible to add support for this? Even basic ones, such as linear warmup follow by cosine or linear decay?
I can also take a look at doing this myself if I could get a pointer to the code.
Thank you!