Scaling recipes is a project for understanding best practices for scaling neural networks for different tasks.
-
Classification on MNIST.
- Implement Standard Parametrization (SP).
- Implement Maximal Update Parametrization (muP).
- Evaluate performance of different parametrizations by varying different aspects like lr, width etc.
-
Flow matching on a toy dataset.
- Implement Standard Parametrization (SP).
- Implement Maximal Update Parametrization (muP).
- Evaluate performance of different parametrizations by varying different aspects like lr, width etc.
python -m venv venv
source venv/bin/activate
pip install .
pip install -e .
pip install -e ".[dev]"
The config file can be found at: slfm/cli/conf/base.yaml
- Sample command to train and evaluate the model:
width=120
lr=0.01
train_and_evaluate "++model.width=${width}" "++trainer.optimizer.lr=${lr}"
- Sample command to run a sweep through different lr and widths with different parametrizations:
sweep
Key observation:
- muP shows more consistent convergence behavior allowing better HP transfer capabilities.
TBD.
The project starts from a very cool notebook on flow matching. A lot of the code for scaling is borrowed from this guide.