Skip to content

Feature Request: μP (Maximal Update Parameterization) #2824

@sbhavani

Description

@sbhavani

Summary

Request to add support for μP (Maximal Update Parameterization) to enable hyperparameter transfer across model scales.

Motivation

μP allows tuning hyperparameters on small models and transferring them to larger models, reducing the cost of large-scale pretraining. It has been adopted in production models like Falcon-H1.

Requested Features

  1. Per-layer Learning Rate Scaling - Different LR multipliers for specific layers
  2. Width-dependent Initialization - μP-style init scaling based on model width

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions