Milestones

May: Prepping scaling training recipe + data mix + GPU training
Due **May 31**. The milestone is the on-ramp to the June 16B-A2B MoE: by month-end the team needs a locked scaling recipe, a real data mixture, and GPU training that actually works. Three workstreams plus eval/infra plumbing: - Scaling recipe (#5358, @ClassicLarry) — isoFLOP results integrated from April, LR retuned, possible long-context extension; output is a forecast for the June run. - Data mix — @Helw150 launches an active swarm over all datakit/sources.py (#5359, target launch May 15) that must beat proportional baselines on UncheatableEval/HumanEval/MMLU/GPQA + David's PPL sets; @ravwojdyla lands the upstream pipeline (#5360: dedup params + contamination detection p0, quality scores p1) in time to feed it; @dlwh + @Helw150 identify the perplexity gaps that drive mixture decisions (#5367). - GPU training — @rjpower gets a June-sized MoE running ~1k steps across 2+ H100 hosts on CoreWeave (#5356), while @dlwh chases Nemotron-parity MFU on the H100 kernels (#5357). - Eval + infra — @yonromai stands up a preemption-resilient vLLM eval service on Iris (#5368, P0 = MMLU + HumanEval on the 1e22 MoE); #5369 is the catch-all infra tune-up (unified queries, zero-trust proxy, GH→Iris). The critical-path dependency is tight: #5360 must deliver by ~May 15 to unblock the #5359 swarm, which in turn feeds the June pre-reg.
Due by May 31, 2026
•0/10 issues closed
0% complete10 open 0 closed
Kick-off pre-trained 100B-A13B 1.2T token MoE (pregistered)
Overdue by 8 day(s)
•
Due by April 30, 2026
•16/28 issues closed
57% complete12 open 16 closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Milestones

May: Prepping scaling training recipe + data mix + GPU training

Kick-off pre-trained 100B-A13B 1.2T token MoE (pregistered)

Milestones

List view

May: Prepping scaling training recipe + data mix + GPU training

Kick-off pre-trained 100B-A13B 1.2T token MoE (pregistered)