Test MoE Arch at 1e21 and 1e22 Flop Scales

## Summary

This issue is the tracking thread for pushing Marin's current MoE recipe from the earlier smaller-scale work in #2167 up to roughly 1e21 and 1e22 non-embedding FLOPs, with the immediate question being whether routing and loss stay stable at those scales. As of March 29, 2026, the 1e22 run is live on v4-512 and the thread records that the team is using quantile balancing (QB) instead of older auxiliary-loss-based load balancing; the current expectation is not that the recipe is fully tuned, but that this run will show whether the larger recipe is stable and how it compares against prior dense Delphi and MoE baselines. The latest thread context also notes a provisional predicted `paloma/macro_loss` of `2.3887` for the 1e22 run, while batch-size scheduling remains static for now and is called out as future tuning work rather than part of this experiment's core claim.

### Helpful links
- [Earlier MoE scaling recipe issue](https://github.com/marin-community/marin/issues/2167)
- [1e22 run](https://wandb.ai/marin-community/dial_moe/runs/moe-v7-1e22-d3200?nw=nwuserlarrydial)
- [Comparison report for dense Delphi and prior MoE runs](https://wandb.ai/marin-community/marin/reports/Large-Scale-MoE-and-Dense-Runs--VmlldzoxNjM1MzQ3NA)
- [Comment with predicted loss and QB explanation](https://github.com/marin-community/marin/issues/3800#issuecomment-4150371512)
- [Launch file for the 1e22 run and isoflop sweeps](https://github.com/marin-community/marin/blob/moe_mar26/experiments/grug/moe_mar26/launch.py)


## Description

Test if the current scaling recipe in https://github.com/marin-community/marin/issues/2167 scales to 1e21 and 1e22 Flop scales without router or general instabilities.

## Hypothesis or Goal

Will the loss curve or routing destabilize? Hypothesizing that similar to 2048 width runs at smaller scales, the initial routing on layer zero will look choppy during lr warmup, then stabilize. 


### Links

1e21 Run: 14B total params, 2B active, 75B tokens.
https://wandb.ai/marin-community/dial_moe/runs/moe-d2304-1e21?nw=nwuserlarrydial

1e22 Run: 35B total params, 5B active, 326B tokens.
https://wandb.ai/marin-community/dial_moe/runs/moe-v7-1e22-d3200?nw=nwuserlarrydial

## Results

Pending

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test MoE Arch at 1e21 and 1e22 Flop Scales #3800

Summary

Helpful links

Description

Hypothesis or Goal

Links

Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test MoE Arch at 1e21 and 1e22 Flop Scales #3800

Description

Summary

Helpful links

Description

Hypothesis or Goal

Links

Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions