Domain-Phase Data Mixture Swarm Experiments#2393
Draft
Conversation
…calvin/swarm-olmo3-regmix-test
…calvin/swarm-olmo3-regmix-test
…calvin/swarm-olmo3-regmix-test
25e9c35 to
ffb69bd
Compare
2dbb057 to
afb0ee5
Compare
afb0ee5 to
46c07bd
Compare
…calvin/swarm-olmo3-regmix-test
…calvin/swarm-olmo3-regmix-test
…gmix-test # Conflicts: # lib/levanter/src/levanter/tracker/tracker_fns.py
Wire the qsplit240 replay onto the Iris east5a path and record the failure analysis that shaped the launcher settings. This also pins the executor to a shared Fray client so large fan-out submissions stop crashing in thread-local client setup.
Check in the GRP ablations, observed-only deployment variants, and subset-validation launchers so they can be reproduced from the repo. The determinism coverage now exercises these launchers and the evaluation table builder reports the new no-penalty ablation alongside the existing rows.
Record the convergence, deployment-variant, and signal-to-noise analyses alongside the self-contained packet that was sent out for procedure review. These artifacts capture the current GRP calibration and regularization work so the plots and packet can be regenerated from committed code.
Sharding the 300M qsplit replay avoids the repeated parent crashes while preserving the original training output roots for checkpoint reuse. The overlap rerun now caches eval datasets and dispatches each Levanter lm-eval step as its own remote TPU job so JAX distributed init does not collide inside the parent process.
Checkpoint the new GRP ablations and local benchmark work while TPU jobs are blocked on capacity. This adds power-law and per-domain baselines plus intrinsic-domain and proposal-bank studies for comparing less extrapolative deployment rules.
Shard the qsplit overlap rerun, add a run_00097 overlap launcher, and backfill missing eval metrics from checkpoint artifacts when historical W&B collection is incomplete. Keep the corrected run_00097 noise baseline and ranked SNR table in-tree so the overlap discussion remains reproducible.
Allow the top-level cache prep and scaling launcher to target regions other than east5 while still reusing prebuilt merged caches where they already exist. This unblocks building the merged runtime layer in us-central1 before moving the scaling runs there.
Record the finished power-law trustblend validation in the GRP table and update the trustblend convergence artifacts with the realized subset and full-data outcomes.
Coscheduled retries could briefly leave old coordinator endpoints visible, letting tasks bootstrap against different JAX coordinators. Prefer the newest endpoint and clear a task's stale endpoints before assigning a retry so distributed init converges on a single coordinator.
Retrying lm-eval task loading now deep-copies dict task specs before passing them to lm-eval. This prevents transient load failures from mutating the caller-owned task config and breaking the retry path.
This batches the new power-family-penalty subset validation launchers with the raw-optimum convergence, comparison, and reporting scripts they feed. It also refreshes the two-phase-many summaries and plots so the validated raw-optimum results and scale-comparison outputs are checked in together.
These tracked artifacts and launch plans were regenerated while updating the exploratory domain-mix workflows. Keeping them in one artifact-only commit makes the branch clean again without mixing them into the code commits above.
Let the 520M and 1.2B swarm launchers schedule across east5 and central1 without a zone pin, and resolve region-sensitive caches through mirror-backed paths. Resume the latest checkpoint roots so restarted baselines and stratified runs keep their existing progress instead of starting from scratch.
Capture the exploratory per-domain exponent probe on top of the power-family penalty surrogate and make the benchmark driver write isolated output stems for follow-up variants. This keeps the local ablation code and probe artifacts together so the negative result is reproducible.
This checkpoints the current data-mixing analysis batch, including run and metric registries, parity rerun launchers, and new GRP, Olmix, and RegMix follow-up artifacts. It also carries the mirror and evaluation fixes needed to make those runs and backfills reproducible from the current workspace.
Add the strong-tier scaling-study launch and tracking plumbing, plus registry and metric-provenance updates for recent runs. Also document the Olmix investigation, add the exact two-phase proposer path, and refresh the key convergence and comparison plots.
Force HF Hub and datasets into offline mode after syncing mirrored eval datasets so LM Eval task loading stays cache-only. Add focused tests covering the offline-mode toggle and sync path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Initial code supporting RegMix-like data mixture experiments that have discrete phases & epoching.