Skip to content

Add MoE depth MuP LR sweep#5179

Draft
WhenWen wants to merge 21 commits intomainfrom
research/moe-depth-mup-lr-sweep
Draft

Add MoE depth MuP LR sweep#5179
WhenWen wants to merge 21 commits intomainfrom
research/moe-depth-mup-lr-sweep

Conversation

@WhenWen
Copy link
Copy Markdown
Contributor

@WhenWen WhenWen commented Apr 25, 2026

Adds opt-in depth MuP residual scaling for Grug MoE and a 36-run LR sweep across d512, d768, d1024, and d1280. The baseline recipe stays unchanged; the dedicated sweep module enables the new residual scaling. Includes focused tests, MoE docs updates, and the research logbook.

Part of #5178

@WhenWen WhenWen added the agent-generated Created by automation/agent label Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant