[moe] Great 10T: sweep E in {128,256,512}

## Description
TL;DR: Sweep E over {128,256,512} within the great 10T gate so expert count is tested under a stronger standard than the good-enough pass.

## Hypothesis or Goal
We want to know whether the preferred expert count survives when we ask for a deeper experimental record.

### Links
* Parent sweep: #3469
* Gate: #4014

## Results


## Summary

This issue is the great-10T follow-up to earlier expert-count sweeps: it asks whether the MoE recipe should still prefer a particular expert count once the comparison is rerun at the stricter gate. PR #4075 now adds the full E={128,256,512} sweep across the great-gate isoflop matrix plus a config-generation test, and its CI checks are green. The implementation work is finished, but the actual training results and any recommendation about which E to keep have not been reported yet.

### Helpful links
- Parent sweep: #3469
- Great gate tracker: #4014
- Implementation PR: #4075

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[moe] Great 10T: sweep E in {128,256,512} #4048

Description

Hypothesis or Goal

Links

Results

Summary

Helpful links

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[moe] Great 10T: sweep E in {128,256,512} #4048

Description

Description

Hypothesis or Goal

Links

Results

Summary

Helpful links

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions