We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent c333125 commit 0433136Copy full SHA for 0433136
1 file changed
experiments/grug/moe/README.md
@@ -104,6 +104,8 @@ Changes can be promoted to this recipe when they demonstrate:
104
2. **Lower projected c4_en BPB at 1e21 and 1e23 FLOPs**, using the scaling-law
105
fit above (L∞ pinned at 1.6 for Paloma macro). Re-fit the power law on the
106
candidate's ladder and compare projections head-to-head.
107
+3. **Low curvature around the minimum of each isoflop curve** — stable
108
+ behavior across under- and over-trained regimes.
109
110
Most promotable changes will land in one of three files:
111
0 commit comments