Skip to content

Agent MoE Experiment: AdamH gradient normalization #5180

@WhenWen

Description

@WhenWen

Description

Test a Grug MoE AdamH variant that normalizes each module's gradients to RMS 1 before AdamH moment updates. The code path is local to experiments/grug/moe/ and compares against the compute-optimal MoE baselines in experiments/grug/moe/README.md.

Exact initiating prompt: "Read experiments/grug/moe/README.md on how to iterate MoE, implement a variant of AdamH optimizer that perform gradient normalization (scale gradient of each module to RMSNorm 1) and test it against the previous method"

TL;DR

Gate 1 is complete. The variant improved the d768 point but missed the d512 point, so it does not pass gate 1 and should not advance to gate 2 under experiments/grug/moe/agent.md.

Hypothesis or Goal

Module-wise gradient RMS normalization reduces optimizer scale mismatch across attention, shared expert, and routed expert modules, improving effective speedup without changing AdamH's projected parameter update rule.

Links

Results

Scale Baseline loss Variant loss Loss delta Baseline tok/s Variant tok/s Tok/s delta Effective speedup
d512 3.8104 3.815110 +0.004710 405,630 406,983 +0.333% 0.980893
d768 3.4339 3.429193 -0.004707 273,532 274,218 +0.251% 1.030269

Both W&B runs reached finished, and both Iris child jobs reached JOB_STATE_SUCCEEDED.

Decision Log

  • 2026-04-25: submitted gate 1 on Iris for d512 and d768.
  • 2026-04-25: d512 finished with effective speedup 0.980893, below the required threshold.
  • 2026-04-25: d768 finished with effective speedup 1.030269, but gate 1 requires both small-scale points to exceed 1.0.

Conclusion

Completed negative result. AdamH module gradient RMS normalization does not pass gate 1, so no gate 2 run is launched for this variant.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions