Skip to content

[moe] Add multi-budget shared expert ablation for great 10T gate#4062

Open
claude[bot] wants to merge 1 commit intomainfrom
agent/20260323-fix-4039
Open

[moe] Add multi-budget shared expert ablation for great 10T gate#4062
claude[bot] wants to merge 1 commit intomainfrom
agent/20260323-fix-4039

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude Bot commented Mar 23, 2026

Adds experiment script running shared-expert vs no-shared-expert at five FLOP budgets (3e18 through 9e19) with appropriately scaled model configs at each budget. Each arm is compute-matched via step count adjustment. The good gate (#4021) relied on a single ~1e19 spot check; this sweep builds a scaling curve for a stronger scientific case. Includes config validation tests.

Fixes #4039

Adds experiment script running shared vs no-shared expert at 5 FLOP
budgets (3e18 through 9e19) with scaled model configs at each budget.
Each arm is compute-matched. Includes config validation tests.

Fixes #4039

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude claude Bot added the agent-generated Created by automation/agent label Mar 23, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07a621240b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

def _build_steps() -> list[ExecutorStep]:
steps: list[ExecutorStep] = []
for budget in FLOP_BUDGETS:
budget_tag = f"{budget:.0e}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve exact budget in run and step tags

The budget tag is generated with f"{budget:.0e}", which rounds to one significant digit; this turns the 1.8e19 arm into 2e+19 in run_id, step names, and W&B grouping. That mislabels results on the scaling curve and can collide with a real 2e19 experiment if one is added, making downstream analysis and run selection ambiguous.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been inactive for 23 days and is marked as stale.
If there is no further activity within 7 days, it will be automatically closed.
If you believe this PR should remain open, please add a comment or update the PR.

@github-actions github-actions Bot added the stale label Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[moe] Great 10T: ablate shared expert

0 participants