Feature Request: μP (Maximal Update Parameterization)

## Summary
Request to add support for [μP (Maximal Update Parameterization)](https://arxiv.org/abs/2203.03466) to enable hyperparameter transfer across model scales.

## Motivation
μP allows tuning hyperparameters on small models and transferring them to larger models, reducing the cost of large-scale pretraining. It has been adopted in production models like [Falcon-H1](https://falcon-lm.github.io/blog/falcon-h1/).

## Requested Features
1. **Per-layer Learning Rate Scaling** - Different LR multipliers for specific layers
2. **Width-dependent Initialization** - μP-style init scaling based on model width

## References
- [MuP Paper](https://arxiv.org/abs/2203.03466)
- [Falcon-H1](https://falcon-lm.github.io/blog/falcon-h1/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: μP (Maximal Update Parameterization) #2824

Summary

Motivation

Requested Features

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: μP (Maximal Update Parameterization) #2824

Description

Summary

Motivation

Requested Features

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions