[moe] Add inv-sqrt LR schedule experiment for Good 10T gate by claude[bot] · Pull Request #4050 · marin-community/marin

claude · 2026-03-23T21:50:07Z

Add experiments/grug/moe/inv_sqrt_lr.py which runs the standard MoE trial with lr_schedule=inv_sqrt instead of cosine. All other settings (model, data, resources, steps) match the baseline in launch.py for a controlled comparison. The inv-sqrt schedule decays continuously from peak LR using 1/sqrt(step/timescale), which may suit long training runs better than cosine.

Fixes #4028

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30b49fb264

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T21:53:05Z

+grug_moe_inv_sqrt_lr = ExecutorStep(
+    name="grug/moe-inv-sqrt-lr",
+    fn=run_grug_moe,
+    config=GrugMoeLaunchConfig(


Restore baseline trainer/eval settings

GrugMoeLaunchConfig is constructed here without grug_trainer and eval, so this run falls back to defaults from experiments/grug/moe/train.py (z_loss_weight=0.0, max_eval_batches=None, eval_ema=True) instead of the baseline settings in experiments/grug/moe/launch.py (z_loss_weight=1e-4, max_eval_batches=8, eval_ema=False). That changes optimization and evaluation behavior beyond LR schedule, so the experiment is not apples-to-apples and cannot isolate the inv-sqrt effect.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-23T21:53:05Z

+    lr_schedule="inv_sqrt",
+    min_lr_ratio=0.1,
+    warmup=1000,


Set an effective inv-sqrt decay timescale

Using string lr_schedule="inv_sqrt" here routes through OptimizerConfig.lr_scheduler, which hardcodes inv-sqrt timescale=10000 (lib/levanter/src/levanter/optim/config.py). With this experiment’s 2,000-step run and 1,000 warmup steps, min(1, 1/sqrt((count+warmup)/timescale)) never drops below 1, so LR never decays after warmup. This means the run is effectively constant-LR vs cosine rather than testing inverse-sqrt decay as intended.

Useful? React with 👍 / 👎.

github-actions · 2026-04-16T01:53:26Z

This pull request has been inactive for 23 days and is marked as stale.
If there is no further activity within 7 days, it will be automatically closed.
If you believe this PR should remain open, please add a comment or update the PR.

[moe] Add inv-sqrt LR schedule experiment for Good 10T gate

30b49fb

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude Bot added the agent-generated Created by automation/agent label Mar 23, 2026

claude Bot mentioned this pull request Mar 23, 2026

[moe] Good 10T: inv-sqrt LR schedule #4028

Open

chatgpt-codex-connector Bot reviewed Mar 23, 2026

View reviewed changes

This was referenced Mar 23, 2026

[moe] Great 10T: inv-sqrt LR schedule #4046

Open

Modeling April 2026 #4266

Closed

claude Bot mentioned this pull request Mar 31, 2026

MoE Scaling up to April goal #4281

Open

3 tasks

github-actions Bot added the stale label Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[moe] Add inv-sqrt LR schedule experiment for Good 10T gate#4050

[moe] Add inv-sqrt LR schedule experiment for Good 10T gate#4050
claude[bot] wants to merge 1 commit intomainfrom
agent/20260323-fix-4028

claude Bot commented Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

claude Bot commented Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants