[moe] Add AdamH vs Adam comparison experiment at 1e19 FLOPs by claude[bot] · Pull Request #4059 · marin-community/marin

claude · 2026-03-23T21:54:32Z

Add GrugAdamHConfig, a grug-compatible AdamH optimizer that classifies parameters by ndim and path name instead of haliax module introspection. Weight matrices (ndim >= 2) get the scale-invariant AdamH update; embeddings, router weights, and norm scalars use standard Adam. Add experiment script that launches both Adam and AdamH on the same d=1024 MoE model (E=8, K=2, shared expert, 13 layers) at ~1e19 FLOPs on Nemotron mix for a controlled optimizer comparison.

Fixes #4024

Add GrugAdamHConfig for raw-array grug models (routes 2D weight matrices to scale-invariant AdamH, embeddings/routers/norms to standard Adam). Add experiment script running both optimizers on d=1024 MoE (E=8, K=2) at ~1e19 FLOPs for a controlled comparison. Fixes #4024 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b666a77855

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T21:59:42Z

+
+
+def _resolve_run_id(base: str) -> str:
+    run_id = os.environ.get("GRUG_RUN_ID", base)


Keep per-step run IDs unique under GRUG_RUN_ID override

_resolve_run_id uses GRUG_RUN_ID verbatim, so both adam_step and adamh_step collapse to the same run_id whenever that env var is set. In this stack, trainer.id is used as the default W&B run id (with resume enabled), so the second run resumes/overwrites the first instead of producing an independent comparison run. That invalidates the Adam-vs-AdamH side-by-side experiment for scripted launches that set GRUG_RUN_ID.

Useful? React with 👍 / 👎.

github-actions · 2026-04-16T01:53:16Z

This pull request has been inactive for 23 days and is marked as stale.
If there is no further activity within 7 days, it will be automatically closed.
If you believe this PR should remain open, please add a comment or update the PR.

claude Bot added the agent-generated Created by automation/agent label Mar 23, 2026

claude Bot mentioned this pull request Mar 23, 2026

[moe] Good 10T: compare AdamH vs Adam #4024

Open

chatgpt-codex-connector Bot reviewed Mar 23, 2026

View reviewed changes

This was referenced Mar 23, 2026

[moe] Great 10T: compare AdamH vs Adam #4042

Open

[moe] Add multi-scale AdamH vs Adam isoflop experiment #4069

Open

github-actions Bot added the stale label Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[moe] Add AdamH vs Adam comparison experiment at 1e19 FLOPs#4059

[moe] Add AdamH vs Adam comparison experiment at 1e19 FLOPs#4059
claude[bot] wants to merge 1 commit intomainfrom
agent/20260323-fix-4024

claude Bot commented Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants



		def _resolve_run_id(base: str) -> str:
		run_id = os.environ.get("GRUG_RUN_ID", base)

Conversation

claude Bot commented Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants