Skip to content

Commit 6deb3dc

Browse files
committed
[logbook] Delphi midtrain: add analysis-utils reference section
Document where the scaling-law / plotting infra lives (lib/marin/src/marin/scaling_laws/ + experiments/isoflop_sweep.py, authored by William Held). Note that tracker_metrics.jsonl is the only source of truth for the two runs whose W&B is polluted by the step-monotonic rejection bug, and that isoflop/scaling-fit is the wrong abstraction for a 3-point x 2-base LR sweep.
1 parent 5fed396 commit 6deb3dc

1 file changed

Lines changed: 51 additions & 0 deletions

File tree

.agents/logbooks/midtraining_delphi.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -967,3 +967,54 @@ Preliminary 1e20 ranking (unsmoothed): `lr=0.67 (0.781) ≈ lr=0.83 (0.782) < lr
967967
- `gs://marin-us-central1/checkpoints/delphi-1e20-iso-d2048-L21-math-10b-lr0.67-e3be0c/` (GCS is the healthy fresh training, but W&B run is polluted with broken data)
968968
- `gs://marin-us-central1/checkpoints/delphi-1e20-iso-d2048-L21-math-10b-lr0.83-db9de7/` (ditto)
969969
- Also the corresponding W&B runs at those names — they display misleading flat-min-lr curves; safe to delete once the `-v2` runs are locked in.
970+
971+
---
972+
973+
## Analysis + plotting utilities in this repo (found 2026-04-23)
974+
975+
Will Held owns most of the scaling-law / analysis infra. If you need to produce plots, fits, or sweep-wide comparisons, these are the code paths to study first rather than rolling your own.
976+
977+
### Core library — `lib/marin/src/marin/scaling_laws/` (≈1001 lines, plotly-based)
978+
979+
| File | Key exports | What it does |
980+
|---|---|---|
981+
| `scaling_plots.py` | `create_isoflop_plot`, `create_scaling_plot`, `save_plots`, `upload_plots_to_wandb` | Plotly figure builders for isoflop curves + scaling fits; GCS save + W&B artifact upload |
982+
| `isoflop_analysis.py` | `fit_scaling_laws`, `predict_optimal_config`, `robust_quad_logx` (Huber-loss quadratic fit), `ScalingFit`, `QuadraticFitCoeffs`, `IsoFlopRecord`, `MinimaRecord`, `CandidateConfig` | Scaling-law math and data structures |
983+
| `eval_metrics_reader.py` | `read_eval_records` (+ W&B backfill via `_backfill_metrics_from_wandb`) | Pulls per-step eval metrics from GCS runs and W&B API, unifying the two sources |
984+
| `tpu_utils.py` | `pick_v5p_type`, `pick_v4_type`, `V5P_SPEC`, `V4_SPEC` | Choose the smallest TPU slice that fits a given model |
985+
| `__init__.py` | Re-exports above | Public API entry |
986+
987+
### Callers / end-to-end wiring
988+
989+
| File | Purpose |
990+
|---|---|
991+
| `experiments/isoflop_sweep.py` | **The canonical ExecutorStep wiring** — reads eval metrics, fits scaling laws, emits plots, uploads to W&B. Pattern-match against this when building a new analysis step. |
992+
| `experiments/exp1337_delphi_suite.py` | Delphi-specific sweep runner using `predict_optimal_config`. Source of the `(H, L, B)`-heuristic pipeline. |
993+
| `experiments/exp2166_scaling_ladder_analysis.py` | Most recent ladder analysis (~2026-02). |
994+
| `experiments/scaling_law_sweeps/completed_adamh.py` | The AdamH heuristic that drove our 1e20 base-model choice — `completed_adamh_heuristic._build_model_config(hidden_size, seq_len)` is the source of truth for Delphi architecture. |
995+
| `experiments/scaling_law_sweeps/c_adamc.py` | AdamC-variant counterpart. |
996+
997+
### Per-run training-loss (no dedicated Marin tool)
998+
999+
For single-run or small-sweep train-loss plots (what this midtraining sweep wants), the options are:
1000+
1001+
- `lib/levanter/scripts/loss_history.py`~30-line example that hits `wandb.Api().runs().scan_history()` for `train/loss` by git-sha. Good template.
1002+
- Read `tracker_metrics.jsonl` at each run's GCS output path directly (Levanter writes it independently of W&B; **this is our only source of truth for the `e3be0c` / `db9de7` runs whose W&B is polluted**). One JSON per step, columns include `train/loss`, `optim/learning_rate`, `optim/adam_lr`, and all `eval/paloma/*/loss` + `eval/uncheatable_eval/*/loss` series. Just `pd.read_json(..., lines=True)`.
1003+
1004+
### For this midtraining sweep specifically
1005+
1006+
The scaling-laws infra is overkill for a 3-point × 2-base LR sweep (no scaling fit is meaningful with one token budget + one parameter count per base). Appropriate plots:
1007+
1008+
- Train-loss vs step, EMA-smoothed, one line per `(base, lr_factor)`.
1009+
- Paloma validation loss vs step (`eval/paloma/c4_en/loss`, `eval/paloma/dolma-v1_5/loss`, etc.), same overlay.
1010+
- Final-loss bar chart per sweep point to pick the winner.
1011+
1012+
A ~50-line script that loads the 6 `tracker_metrics.jsonl` files (3 × 1e20 + 3 × 1e21 once they land) and renders these with matplotlib or plotly is enough. **Do not** build it on top of `isoflop_analysis.py` — wrong abstraction. **Do** reuse `eval_metrics_reader.read_eval_records` for the GCS + W&B unification logic if the runs have the right shape (check its filters first).
1013+
1014+
### Authorship / blame-walk
1015+
1016+
- `scaling_plots.py` / `isoflop_analysis.py` / `eval_metrics_reader.py` / `isoflop_sweep.py` — William Held, PR #2243 "Scaling Plots & Analysis as an Executor Step".
1017+
- Delphi pipeline (`exp1337_delphi_suite.py`) — William Held, PR #3292 "Delphi Scaling Setup", plus PR #4591 "exp1337: add seed sweep".
1018+
- AdamH heuristic — William Held, PR #2447 "Beta2 gets a bit wacky with very large batch sizes...".
1019+
1020+
When in doubt on scaling/analysis decisions, `git log --format='%an %s' -- <file>` → look for Will.

0 commit comments

Comments
 (0)