[moe] Good 10T: sweep capacity factor

## Summary

This issue asked whether the current MoE capacity factor was causing avoidable slowdown on the good-enough 10T path. The answer was yes: follow-up results in #4016 showed that the default `1.25` padding cost about `8%` throughput at `1e20` scale and about `11%` at `1e21`, while lower settings caused only negligible loss impact in the tested regimes. The practical outcome was to move the default capacity factor to `1.0`, with `1.1` noted as a more conservative option if future higher-EP or smaller-batch runs show overflow risk.

### Helpful links
- [Capacity-factor sweep table at 3e18 scale](https://github.com/marin-community/marin/issues/4016#issuecomment-4131856745)
- [Detailed overflow analysis and 1.1 vs 1.0 recommendation](https://github.com/marin-community/marin/issues/4016#issuecomment-4132027737)
- [Decision note to update the default to 1.0](https://github.com/marin-community/marin/issues/4016#issuecomment-4144525023)


## Description
TL;DR: Sweep capacity factor on the current good-enough 10T candidate and verify that it is not a hidden source of problems.

## Hypothesis or Goal
We want to know whether the chosen capacity factor is already safe or whether it is masking avoidable overflow or throughput loss.

### Links
* Parent sweep: #3469
* Gate: #4013
* Overflow + capacity-factor results: #4016

## Results
Capacity-factor sweep results were reported in #4016 (not in this issue) by @ClassicLarry:

- EP=2, 3e18 runs: capacity-factor sweep table with BPB/macro-loss/tokens/s: https://github.com/marin-community/marin/issues/4016#issuecomment-4131856745
- Detailed overflow analysis and recommendation (1.1 conservative, 1.0 aggressive): https://github.com/marin-community/marin/issues/4016#issuecomment-4132027737
- 1e20 EP=4 direct compare: cap=1.0 vs 1.25 was 8.3% faster with +0.001 BPB: https://github.com/marin-community/marin/issues/4016#issuecomment-4144517642
- Final decision note: move default capacity factor to 1.0: https://github.com/marin-community/marin/issues/4016#issuecomment-4144525023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[moe] Good 10T: sweep capacity factor #4017

Summary

Helpful links

Description

Hypothesis or Goal

Links

Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[moe] Good 10T: sweep capacity factor #4017

Description

Summary

Helpful links

Description

Hypothesis or Goal

Links

Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions