Skip to content

[moe] Add gated norm support and ablation launch script#4057

Open
claude[bot] wants to merge 1 commit intomainfrom
agent/20260323-fix-4026
Open

[moe] Add gated norm support and ablation launch script#4057
claude[bot] wants to merge 1 commit intomainfrom
agent/20260323-fix-4026

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude Bot commented Mar 23, 2026

Add GatedNorm (low-rank self-gating after RMSNorm) to the MoE grug model with a gated_norm_rank config field on GrugModelConfig. When set, gated norms are applied after each RMSNorm in the attention and MLP sub-blocks. Includes an ablation launch script comparing baseline vs gated-norm-rank=16 at ~1e19 FLOPs for the good 10T gate, and a lowering contract test for the new config knob.

Fixes #4026

Add GatedNorm (low-rank self-gating after RMSNorm) to the MoE grug model
with a gated_norm_rank config field, and create an ablation launch script
comparing baseline vs gated-norm at ~1e19 FLOPs for the good 10T gate.

Fixes #4026

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude claude Bot added the agent-generated Created by automation/agent label Mar 23, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8df5916e1e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +184 to +185
from experiments.grug.moe.model import GrugModelConfig, debug_mesh_and_token_pspec
from experiments.grug.moe.train import initial_state as moe_initial_state, _make_train_step
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move local test imports to module scope

The new local imports in test_grug_moe_gated_norm_lowers violate the repository rule in /workspace/marin/AGENTS.md (“All imports at the top of the file. No local imports except to break circular dependencies or guard optional deps.”). This makes the test inconsistent with the project’s enforced import contract and can defer dependency failures until function execution instead of test collection, so these imports should be moved to the file-level import section.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been inactive for 23 days and is marked as stale.
If there is no further activity within 7 days, it will be automatically closed.
If you believe this PR should remain open, please add a comment or update the PR.

@github-actions github-actions Bot added the stale label Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[moe] Good 10T: ablate gated norms

0 participants