You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Final losses 0.781 (matches original) and 0.782 (original 0.772,
+0.010 single-step noise). Both W&B runs accepted fresh metrics this
time (no step-monotonic rejection). Canonical 1e20 sweep results now
lr=0.5 0.840, lr=0.67-v2 0.781, lr=0.83-v2 0.782.
Copy file name to clipboardExpand all lines: .agents/logbooks/midtraining_delphi.md
+31Lines changed: 31 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -936,3 +936,34 @@ For follow-up ranking with smoothed curves, read `tracker_metrics.jsonl` at each
936
936
1. Pull smoothed train-loss + Paloma trajectories from each run's `tracker_metrics.jsonl`; confirm the preliminary 1e20 ranking.
937
937
2. Launch 3 × 1e21 sweep points (`lr=0.5 / 0.67 / 0.83`) on `v5p-64`. Same launch recipe as the 1e20 relaunch above. Expected wall-time ~10 h per run (3.4 B params, same BS=512, slightly larger). The pretrain ckpt lives at `gs://marin-us-central1/adamh-scaling-ladder-nemotron-optimal-1e+21-v5-019021/checkpoints/step-21979/` with schedule count ~21979 (smaller than `num_train_steps=4768` — no wait, it's *larger*, so the same flat-min-lr pathology would apply to the 1e21 runs without the fix, and does not with MODEL_ONLY). With MODEL_ONLY plumbed, the 1e21 sweep will train at the scheduled warmup→peak→decay.
938
938
3. When all 6 land, cross-ranking + winner selection + writeup. Store the winning (base, lr_factor) combination as input to any downstream sweep.
Context: the original lr=0.67 and lr=0.83 runs succeeded with correct training (final losses 0.781 and 0.772) but their W&B panels still showed the OLD broken flat-min-lr curves. Root cause: the Marin executor hash only tracks `versioned(...)` values + `step.name` + upstream dep paths — plain `SimpleTrainConfig` fields (including our `checkpoint_init_mode`) are invisible. With the same us-central1 tokenize dep as before, both runs landed at the same output hashes as the broken v10-era runs (`e3be0c`, `db9de7`) → same W&B run_ids → W&B's step-monotonic guard rejected the fresh metrics.
943
+
944
+
Fix (commit `0a5b1fde3`): append `-v2` to the `step.name` template in `experiments/exp_delphi_math_10b_midtrain.py:221`, so the name-contribution to the hash changes. Relaunched both with coordinators `/ahmed/delphi-math-10b-1e20-lr{0.67,0.83}-v2-20260423`.
945
+
946
+
Results:
947
+
948
+
| Run | Output hash | W&B run name | Final single-step loss |
949
+
|---|---|---|---:|
950
+
|`lr=0.67-v2`|`a176ff`|`delphi-1e20-iso-d2048-L21-math-10b-lr0.67-v2-a176ff`|**0.781** (matches original 0.781 exactly) |
Both coordinators in terminal `succeeded` state. No `Cowardly refusing to log metrics` warnings this time — W&B accepted the fresh metrics, so these two runs now have clean warmup→peak→decay curves on the W&B panel.
954
+
955
+
The 1e20 sweep now has **one set of canonical, clean-W&B results** for the cross-ranking:
Preliminary 1e20 ranking (unsmoothed): `lr=0.67 (0.781) ≈ lr=0.83 (0.782) < lr=0.5 (0.840)`. The 0.67/0.83 gap is within noise; smoothed curves + Paloma eval should disambiguate. Either factor is a reasonable default for the 1e21 sweep.
964
+
965
+
**Stale artifacts to eventually garbage-collect** (no longer canonical; W&B + GCS data is superseded by the `-v2` runs):
966
+
967
+
-`gs://marin-us-central1/checkpoints/delphi-1e20-iso-d2048-L21-math-10b-lr0.67-e3be0c/` (GCS is the healthy fresh training, but W&B run is polluted with broken data)
0 commit comments