recipe for hyperparameter sweep #1752

felipemello1 · 2024-10-04T01:30:01Z

Torchtune could provide a recipe to do HPO, where the user provides a config, the recipe, eval dataset, params to sweep and budget.

I just played with optimizer. Our default in lr 3e-4. I tried 3e-3 and the ScheduleFree optimizer. Big difference.

I don't know how well it works for finetuning, but an extension could be to allow HPO with smaller models, and transfer the best settings to the large models using muP (https://github.com/microsoft/mup).

RdoubleA · 2024-10-04T04:25:41Z

We would need some form of validation during training before we can do HPO, no?

On a related note, can we even run eval on a custom benchmark dataset that's not in EleutherAI?

felipemello1 · 2024-10-04T19:55:01Z

I don’t know if necessarily during training. Maybe it’s ok to train for X steps and then do eval. I also think that as long as you are training for a single epoch, the train loss may be a good enough proxy, since the data doesn’t get repeated, and therefore there is no memorization

felipemello1 added the enhancement New feature or request label Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recipe for hyperparameter sweep #1752

recipe for hyperparameter sweep #1752

felipemello1 commented Oct 4, 2024 •

edited

Loading

RdoubleA commented Oct 4, 2024

felipemello1 commented Oct 4, 2024

recipe for hyperparameter sweep #1752

recipe for hyperparameter sweep #1752

Comments

felipemello1 commented Oct 4, 2024 • edited Loading

RdoubleA commented Oct 4, 2024

felipemello1 commented Oct 4, 2024

felipemello1 commented Oct 4, 2024 •

edited

Loading