Skip to content

[moe] Wire capacity overflow reporting into Grug MoE training metrics#4052

Open
claude[bot] wants to merge 1 commit intomainfrom
agent/20260323-fix-4016
Open

[moe] Wire capacity overflow reporting into Grug MoE training metrics#4052
claude[bot] wants to merge 1 commit intomainfrom
agent/20260323-fix-4016

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude Bot commented Mar 23, 2026

Enable report_capacity_overflow=True in MoEMLP.call and propagate dropped_count and overflow_fraction through router_stats into the summarized training metrics logged to wandb. Per-layer and aggregate overflow metrics are now visible as train/router/dropped_count_total, train/router/overflow_fraction_mean, and per-layer variants.

Fixes #4016

Enable report_capacity_overflow=True in MoEMLP.__call__ and propagate
dropped_count and overflow_fraction through router_stats into the
summarized training metrics logged to wandb.

Fixes #4016

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude claude Bot added the agent-generated Created by automation/agent label Mar 23, 2026
This was referenced Mar 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been inactive for 23 days and is marked as stale.
If there is no further activity within 7 days, it will be automatically closed.
If you believe this PR should remain open, please add a comment or update the PR.

@github-actions github-actions Bot added the stale label Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[moe] Good 10T: measure capacity overflow

1 participant