Skip to content

VigneshSrinivasan10/scaling-recipes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scaling Recipes

Scaling recipes is a project for understanding best practices for scaling neural networks for different tasks.

Scope

  • Classification on MNIST.

    • Implement Standard Parametrization (SP).
    • Implement Maximal Update Parametrization (muP).
    • Evaluate performance of different parametrizations by varying different aspects like lr, width etc.
  • Flow matching on a toy dataset.

    • Implement Standard Parametrization (SP).
    • Implement Maximal Update Parametrization (muP).
    • Evaluate performance of different parametrizations by varying different aspects like lr, width etc.

Installation

python -m venv venv 
source venv/bin/activate

pip install .
pip install -e .
pip install -e ".[dev]"

Usage

Config

The config file can be found at: slfm/cli/conf/base.yaml

Train and evaluate

  • Sample command to train and evaluate the model:
width=120
lr=0.01
train_and_evaluate  "++model.width=${width}" "++trainer.optimizer.lr=${lr}"
  • Sample command to run a sweep through different lr and widths with different parametrizations:
sweep

Expected outcome

Key observation:

  • muP shows more consistent convergence behavior allowing better HP transfer capabilities.

muP SP

Flow matching

TBD.

Thanks

The project starts from a very cool notebook on flow matching. A lot of the code for scaling is borrowed from this guide.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages