Skip to content

[moe] Great 10T: sweep E in {128,256,512} #4048

@dlwh

Description

@dlwh

Description

TL;DR: Sweep E over {128,256,512} within the great 10T gate so expert count is tested under a stronger standard than the good-enough pass.

Hypothesis or Goal

We want to know whether the preferred expert count survives when we ask for a deeper experimental record.

Links

Results

Summary

This issue is the great-10T follow-up to earlier expert-count sweeps: it asks whether the MoE recipe should still prefer a particular expert count once the comparison is rerun at the stricter gate. PR #4075 now adds the full E={128,256,512} sweep across the great-gate isoflop matrix plus a config-generation test, and its CI checks are green. The implementation work is finished, but the actual training results and any recommendation about which E to keep have not been reported yet.

Helpful links

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent-generatedCreated by automation/agentexperimentmoetldrIssue has a community-friendly TL;DR summary

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions