[moe] Add multi-budget shared expert ablation for great 10T gate by claude[bot] · Pull Request #4062 · marin-community/marin

claude · 2026-03-23T21:58:30Z

Adds experiment script running shared-expert vs no-shared-expert at five FLOP budgets (3e18 through 9e19) with appropriately scaled model configs at each budget. Each arm is compute-matched via step count adjustment. The good gate (#4021) relied on a single ~1e19 spot check; this sweep builds a scaling curve for a stronger scientific case. Includes config validation tests.

Fixes #4039

Adds experiment script running shared vs no-shared expert at 5 FLOP budgets (3e18 through 9e19) with scaled model configs at each budget. Each arm is compute-matched. Includes config validation tests. Fixes #4039 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07a621240b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T22:02:55Z

+def _build_steps() -> list[ExecutorStep]:
+    steps: list[ExecutorStep] = []
+    for budget in FLOP_BUDGETS:
+        budget_tag = f"{budget:.0e}"


Preserve exact budget in run and step tags

The budget tag is generated with f"{budget:.0e}", which rounds to one significant digit; this turns the 1.8e19 arm into 2e+19 in run_id, step names, and W&B grouping. That mislabels results on the scaling curve and can collide with a real 2e19 experiment if one is added, making downstream analysis and run selection ambiguous.

Useful? React with 👍 / 👎.

github-actions · 2026-04-16T01:53:12Z

This pull request has been inactive for 23 days and is marked as stale.
If there is no further activity within 7 days, it will be automatically closed.
If you believe this PR should remain open, please add a comment or update the PR.

claude Bot added the agent-generated Created by automation/agent label Mar 23, 2026

claude Bot mentioned this pull request Mar 23, 2026

[moe] Great 10T: ablate shared expert #4039

Open

chatgpt-codex-connector Bot reviewed Mar 23, 2026

View reviewed changes

claude Bot mentioned this pull request Mar 23, 2026

[moe] Great 10T: ablate sliding-window attention #4045

Open

github-actions Bot added the stale label Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[moe] Add multi-budget shared expert ablation for great 10T gate#4062

[moe] Add multi-budget shared expert ablation for great 10T gate#4062
claude[bot] wants to merge 1 commit intomainfrom
agent/20260323-fix-4039

claude Bot commented Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

claude Bot commented Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants