[moe] Add gated norm support and ablation launch script#4057
[moe] Add gated norm support and ablation launch script#4057claude[bot] wants to merge 1 commit intomainfrom
Conversation
Add GatedNorm (low-rank self-gating after RMSNorm) to the MoE grug model with a gated_norm_rank config field, and create an ablation launch script comparing baseline vs gated-norm at ~1e19 FLOPs for the good 10T gate. Fixes #4026 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8df5916e1e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| from experiments.grug.moe.model import GrugModelConfig, debug_mesh_and_token_pspec | ||
| from experiments.grug.moe.train import initial_state as moe_initial_state, _make_train_step |
There was a problem hiding this comment.
Move local test imports to module scope
The new local imports in test_grug_moe_gated_norm_lowers violate the repository rule in /workspace/marin/AGENTS.md (“All imports at the top of the file. No local imports except to break circular dependencies or guard optional deps.”). This makes the test inconsistent with the project’s enforced import contract and can defer dependency failures until function execution instead of test collection, so these imports should be moved to the file-level import section.
Useful? React with 👍 / 👎.
|
This pull request has been inactive for 23 days and is marked as stale. |
Add GatedNorm (low-rank self-gating after RMSNorm) to the MoE grug model with a gated_norm_rank config field on GrugModelConfig. When set, gated norms are applied after each RMSNorm in the attention and MLP sub-blocks. Includes an ablation launch script comparing baseline vs gated-norm-rank=16 at ~1e19 FLOPs for the good 10T gate, and a lowering contract test for the new config knob.
Fixes #4026