Skip to content

[moe] Add shared-expert ablation experiment at ~1e19 FLOPs#4051

Open
claude[bot] wants to merge 1 commit intomainfrom
agent/20260323-fix-4021
Open

[moe] Add shared-expert ablation experiment at ~1e19 FLOPs#4051
claude[bot] wants to merge 1 commit intomainfrom
agent/20260323-fix-4021

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude Bot commented Mar 23, 2026

Two-arm experiment comparing shared expert (baseline) vs no shared expert at ~1e19 total training FLOPs using the grug MoE trial model config. Shared arm runs ~5407 steps, no-shared arm runs ~6091 steps (bs=512, seq=4096) to match the FLOP budget. Adds config validation test.

Fixes #4021

Add experiments/grug/moe/exp4021_ablate_shared_expert.py with two arms
(shared expert baseline vs no shared expert) each targeting ~1e19 total
training FLOPs using the trial model config. Adds test validating FLOP
budgets and config correctness.

Fixes #4021
@claude claude Bot added the agent-generated Created by automation/agent label Mar 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been inactive for 23 days and is marked as stale.
If there is no further activity within 7 days, it will be automatically closed.
If you believe this PR should remain open, please add a comment or update the PR.

@github-actions github-actions Bot added the stale label Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[moe] Good 10T: ablate shared expert

0 participants