You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Torchtune could provide a recipe to do HPO, where the user provides a config, the recipe, eval dataset, params to sweep and budget.
I just played with optimizer. Our default in lr 3e-4. I tried 3e-3 and the ScheduleFree optimizer. Big difference.
I don't know how well it works for finetuning, but an extension could be to allow HPO with smaller models, and transfer the best settings to the large models using muP (https://github.com/microsoft/mup).
The text was updated successfully, but these errors were encountered:
I don’t know if necessarily during training. Maybe it’s ok to train for X steps and then do eval. I also think that as long as you are training for a single epoch, the train loss may be a good enough proxy, since the data doesn’t get repeated, and therefore there is no memorization
Torchtune could provide a recipe to do HPO, where the user provides a config, the recipe, eval dataset, params to sweep and budget.
I just played with optimizer. Our default in lr 3e-4. I tried 3e-3 and the ScheduleFree optimizer. Big difference.
I don't know how well it works for finetuning, but an extension could be to allow HPO with smaller models, and transfer the best settings to the large models using muP (https://github.com/microsoft/mup).
The text was updated successfully, but these errors were encountered: