Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d97fdc3131
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
|
|
||
| def _resolve_run_id(label: str) -> str: | ||
| run_id = os.environ.get("GRUG_RUN_ID", f"moe-adamh-grad-norm-{label}") |
There was a problem hiding this comment.
Keep per-step run IDs unique for gated launches
When GRUG_RUN_ID is set, _resolve_run_id returns the same ID for every label, so gate1/all runs emit multiple steps with identical run_ids. In run_grug_moe_trial, that ID becomes the trainer/W&B run ID (and W&B defaults to resume="allow"), so subsequent steps can resume or overwrite earlier runs instead of producing separate experiment records. This breaks side-by-side ablation tracking for the very comparisons this launcher is meant to run.
Useful? React with 👍 / 👎.
Add a Grug MoE AdamH variant that normalizes each module gradients to RMS 1 before AdamH moment updates. Includes gate-specific launch wiring for d512/d768 and d1024/d1280 comparison runs plus focused optimizer tests.
Part of #5180