[grug] Add MoE AdamH gradient normalization by WhenWen · Pull Request #5181 · marin-community/marin

WhenWen · 2026-04-25T18:37:39Z

Add a Grug MoE AdamH variant that normalizes each module gradients to RMS 1 before AdamH moment updates. Includes gate-specific launch wiring for d512/d768 and d1024/d1280 comparison runs plus focused optimizer tests.

Part of #5180

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d97fdc3131

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-25T18:43:21Z

+
+
+def _resolve_run_id(label: str) -> str:
+    run_id = os.environ.get("GRUG_RUN_ID", f"moe-adamh-grad-norm-{label}")


Keep per-step run IDs unique for gated launches

When GRUG_RUN_ID is set, _resolve_run_id returns the same ID for every label, so gate1/all runs emit multiple steps with identical run_ids. In run_grug_moe_trial, that ID becomes the trainer/W&B run ID (and W&B defaults to resume="allow"), so subsequent steps can resume or overwrite earlier runs instead of producing separate experiment records. This breaks side-by-side ablation tracking for the very comparisons this launcher is meant to run.

Useful? React with 👍 / 👎.

[grug] Add MoE AdamH gradient normalization

d97fdc3

WhenWen added the agent-generated Created by automation/agent label Apr 25, 2026

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

WhenWen mentioned this pull request Apr 25, 2026

Agent MoE Experiment: AdamH gradient normalization #5180

Closed

Kaiyue Wen added 3 commits April 25, 2026 11:51

[grug] Record AdamH grad norm gate 1 launch

3525577

[grug] Record AdamH grad norm d512 result

e313c3f

[grug] Record AdamH grad norm gate result

2356b2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[grug] Add MoE AdamH gradient normalization#5181

[grug] Add MoE AdamH gradient normalization#5181
WhenWen wants to merge 4 commits intomainfrom
research/moe-adamh-grad-norm

WhenWen commented Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant



		def _resolve_run_id(label: str) -> str:
		run_id = os.environ.get("GRUG_RUN_ID", f"moe-adamh-grad-norm-{label}")

Conversation

WhenWen commented Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant