adding support for LR schedule for full distributed finetune

My understanding is that the full multi-gpu fine-tuning doesn't yet support learning rate schedules.


Would it be possible to add support for this? Even basic ones, such as linear warmup follow by cosine or linear decay?

I can also take a look at doing this myself if I could get a pointer to the code. 

Thank you!