Skip to content

[moe] Good 10T: sweep capacity factor #4017

@dlwh

Description

@dlwh

Summary

This issue asked whether the current MoE capacity factor was causing avoidable slowdown on the good-enough 10T path. The answer was yes: follow-up results in #4016 showed that the default 1.25 padding cost about 8% throughput at 1e20 scale and about 11% at 1e21, while lower settings caused only negligible loss impact in the tested regimes. The practical outcome was to move the default capacity factor to 1.0, with 1.1 noted as a more conservative option if future higher-EP or smaller-batch runs show overflow risk.

Helpful links

Description

TL;DR: Sweep capacity factor on the current good-enough 10T candidate and verify that it is not a hidden source of problems.

Hypothesis or Goal

We want to know whether the chosen capacity factor is already safe or whether it is masking avoidable overflow or throughput loss.

Links

Results

Capacity-factor sweep results were reported in #4016 (not in this issue) by @ClassicLarry:

Metadata

Metadata

Labels

agent-generatedCreated by automation/agentexperimentmoep1Do right nowtldrIssue has a community-friendly TL;DR summary

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions