Skip to content

Commit d04dea5

Browse files
committed
[logbook] Delphi midtrain: clarify project goal hierarchy
Two overarching project goals: (1) predict midtraining loss trajectory from smaller runs, (2) pick a good midtraining dataset. The LR sweep tracked here is a subgoal feeding both tracks — LR has to be solved before dataset-quality can be cleanly isolated.
1 parent 6deb3dc commit d04dea5

1 file changed

Lines changed: 9 additions & 0 deletions

File tree

.agents/logbooks/midtraining_delphi.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,17 @@
3232

3333
## Goal
3434

35+
### Overarching project goals (two tracks)
36+
37+
1. **Predict loss trajectory for midtraining.** Build a calibrated expectation for how math-eval loss evolves when you continue-train an existing pretrain checkpoint on a math-heavy dataset, so that larger/more-expensive midtrain runs can be scoped (token budget, LR schedule, decay shape) from smaller runs instead of guessed. This Delphi × Nemotron-CC-Math sweep is a primary data point for that predictor.
38+
2. **Pick a good midtraining dataset.** Compare candidate math-heavy datasets (the Nemotron-CC-Math quality tiers being our current anchor) on their effect on post-midtrain downstream evals, so that future runs don't waste compute on a poorly-curated corpus.
39+
40+
### This logbook's subgoal (the concrete sweep tracked here)
41+
3542
Run a small LR sweep that continues-trains the two smallest existing AdamH-trained Marin checkpoints on **10 B tokens of `nemotron_cc_math_v1/4plus`**, to de-risk a Mantis-style math-midtraining recipe before spending v4-512 / v4-1024 time on Delphi 1e22 / 1e23. We're looking for the highest peak LR that still drops the math-eval loss monotonically.
3643

44+
This subgoal feeds both tracks: the LR-factor × final-loss pairs inform the loss-trajectory predictor (track 1), and the dataset is held fixed at `nemotron_cc_math_v1/4plus` so the signal is attributable to LR choice — a prerequisite for the dataset-selection track (2), which requires LR to be a solved variable before dataset-quality can be cleanly isolated.
45+
3746
## User-specified constraints
3847

3948
- Start the midtraining peak LR at **2/3 of each base's own pretrain peak**, sweep around there.

0 commit comments

Comments
 (0)