[moe] Add multi-budget shared expert ablation for great 10T gate#4062
[moe] Add multi-budget shared expert ablation for great 10T gate#4062claude[bot] wants to merge 1 commit intomainfrom
Conversation
Adds experiment script running shared vs no-shared expert at 5 FLOP budgets (3e18 through 9e19) with scaled model configs at each budget. Each arm is compute-matched. Includes config validation tests. Fixes #4039 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 07a621240b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| def _build_steps() -> list[ExecutorStep]: | ||
| steps: list[ExecutorStep] = [] | ||
| for budget in FLOP_BUDGETS: | ||
| budget_tag = f"{budget:.0e}" |
There was a problem hiding this comment.
Preserve exact budget in run and step tags
The budget tag is generated with f"{budget:.0e}", which rounds to one significant digit; this turns the 1.8e19 arm into 2e+19 in run_id, step names, and W&B grouping. That mislabels results on the scaling curve and can collide with a real 2e19 experiment if one is added, making downstream analysis and run selection ambiguous.
Useful? React with 👍 / 👎.
|
This pull request has been inactive for 23 days and is marked as stale. |
Adds experiment script running shared-expert vs no-shared-expert at five FLOP budgets (3e18 through 9e19) with appropriately scaled model configs at each budget. Each arm is compute-matched via step count adjustment. The good gate (#4021) relied on a single ~1e19 spot check; this sweep builds a scaling curve for a stronger scientific case. Includes config validation tests.
Fixes #4039