Skip to content

[moe] Add AdamH vs Adam comparison experiment at 1e19 FLOPs#4059

Open
claude[bot] wants to merge 1 commit intomainfrom
agent/20260323-fix-4024
Open

[moe] Add AdamH vs Adam comparison experiment at 1e19 FLOPs#4059
claude[bot] wants to merge 1 commit intomainfrom
agent/20260323-fix-4024

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude Bot commented Mar 23, 2026

Add GrugAdamHConfig, a grug-compatible AdamH optimizer that classifies parameters by ndim and path name instead of haliax module introspection. Weight matrices (ndim >= 2) get the scale-invariant AdamH update; embeddings, router weights, and norm scalars use standard Adam. Add experiment script that launches both Adam and AdamH on the same d=1024 MoE model (E=8, K=2, shared expert, 13 layers) at ~1e19 FLOPs on Nemotron mix for a controlled optimizer comparison.

Fixes #4024

Add GrugAdamHConfig for raw-array grug models (routes 2D weight matrices
to scale-invariant AdamH, embeddings/routers/norms to standard Adam).
Add experiment script running both optimizers on d=1024 MoE (E=8, K=2)
at ~1e19 FLOPs for a controlled comparison.

Fixes #4024

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude claude Bot added the agent-generated Created by automation/agent label Mar 23, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b666a77855

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".



def _resolve_run_id(base: str) -> str:
run_id = os.environ.get("GRUG_RUN_ID", base)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep per-step run IDs unique under GRUG_RUN_ID override

_resolve_run_id uses GRUG_RUN_ID verbatim, so both adam_step and adamh_step collapse to the same run_id whenever that env var is set. In this stack, trainer.id is used as the default W&B run id (with resume enabled), so the second run resumes/overwrites the first instead of producing an independent comparison run. That invalidates the Adam-vs-AdamH side-by-side experiment for scripted launches that set GRUG_RUN_ID.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been inactive for 23 days and is marked as stale.
If there is no further activity within 7 days, it will be automatically closed.
If you believe this PR should remain open, please add a comment or update the PR.

@github-actions github-actions Bot added the stale label Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[moe] Good 10T: compare AdamH vs Adam

0 participants