Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recipe for hyperparameter sweep #1752

Open
felipemello1 opened this issue Oct 4, 2024 · 2 comments
Open

recipe for hyperparameter sweep #1752

felipemello1 opened this issue Oct 4, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@felipemello1
Copy link
Contributor

felipemello1 commented Oct 4, 2024

Torchtune could provide a recipe to do HPO, where the user provides a config, the recipe, eval dataset, params to sweep and budget.

I just played with optimizer. Our default in lr 3e-4. I tried 3e-3 and the ScheduleFree optimizer. Big difference.

image

I don't know how well it works for finetuning, but an extension could be to allow HPO with smaller models, and transfer the best settings to the large models using muP (https://github.com/microsoft/mup).

@felipemello1 felipemello1 added the enhancement New feature or request label Oct 4, 2024
@RdoubleA
Copy link
Contributor

RdoubleA commented Oct 4, 2024

We would need some form of validation during training before we can do HPO, no?

On a related note, can we even run eval on a custom benchmark dataset that's not in EleutherAI?

@felipemello1
Copy link
Contributor Author

I don’t know if necessarily during training. Maybe it’s ok to train for X steps and then do eval. I also think that as long as you are training for a single epoch, the train loss may be a good enough proxy, since the data doesn’t get repeated, and therefore there is no memorization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants