You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
μP allows tuning hyperparameters on small models and transferring them to larger models, reducing the cost of large-scale pretraining. It has been adopted in production models like Falcon-H1.
Requested Features
Per-layer Learning Rate Scaling - Different LR multipliers for specific layers
Width-dependent Initialization - μP-style init scaling based on model width