Conversation
Unify regular DPO and LoRA-DPO under the same train_dpo.py entrypoint with config-driven adapter/reference modes. LoRA-DPO uses the adapter base model as the implicit frozen reference, halving model memory. Key changes: - Shared adaptation layer (levanter/adaptation.py) for LoRA application, trainable-param filtering, and HF export hooks - Shared model/bootstrap helper (levanter/main/model_init.py) - Durable reference eval cache (2x eval speedup by caching frozen reference logprobs to GCS sidecar) - HF export fixes: LoRA merge axis-order bug, chat_template embedding - Legacy lora_dpo.py compatibility shim for older configs - DPO training doc updated with LoRA-DPO sections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverts pointless import shuffling in exp1600_uncheatable_evals.py and fixes wrong import path in models.py (marin.download → marin.datakit.download). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cd51d5e2f7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
experiments/models.py
Outdated
| from dataclasses import dataclass | ||
|
|
||
| from marin.datakit.download.huggingface import DownloadConfig, download_hf | ||
| from marin.download.huggingface.download_hf import DownloadConfig, download_hf |
There was a problem hiding this comment.
Use existing Hugging Face downloader module path
This import points to a module path that does not exist in this repository (marin.download...), so importing experiments.models now raises ModuleNotFoundError and breaks any workflow that imports these defaults (for example via experiments.defaults). The previous marin.datakit.download.huggingface path is present, so this change introduces an immediate runtime/import failure.
Useful? React with 👍 / 👎.
| converter.save_pretrained( | ||
| _policy_model_for_hf_save(step.eval_model), | ||
| os.path.join(full_save_path, f"step-{step.step}"), | ||
| upload_to_hf=upload_to_hf, | ||
| dtype=save_dtype, | ||
| **hf_upload_kwargs, |
There was a problem hiding this comment.
Forward generation config in separate-reference HF export
The separate-reference export path no longer passes export.generation_config into converter.save_pretrained, so hf_generation_eos_token_ids is silently ignored for standard DPO exports (reference.type=separate, adapter.type=none). That produces checkpoints without the intended generation settings and can change downstream generation behavior.
Useful? React with 👍 / 👎.
| dataset_size = _dpo_training_dataset_size(train_config) | ||
| logger.info("Resolved DPO train set size from tokenizer stats: %d examples", dataset_size) |
There was a problem hiding this comment.
Skip dataset-size resolution when only eval schedule is auto
This function always resolves dataset size from cache stats whenever either auto knob is set, even if only auto_validation_runs is requested. In the common case of explicit num_train_steps with auto validation scheduling, this introduces an unnecessary hard dependency on .stats.json/concrete cache paths and can fail before launch despite not needing dataset size to compute eval steps.
Useful? React with 👍 / 👎.
Summary
train_dpo.pyentrypoint via config-drivenadapter.type+reference.typeadaptation.py), model init helper (model_init.py), and durable reference eval cache (2x eval speedup)Cleaned-up version of #4634 — removed logbooks, experiment sweep scripts, one-off configs, debug logs, and plots (125 files → 28 files).
Test plan
lib/levanter/tests/test_dpo.py— 37 passedlib/levanter/tests/test_lora_dpo.py— 12 passedlib/levanter/tests/test_lora.py— 7 passedlib/levanter/tests/test_hf_export.py— 8 passedtests/test_training.py— 9 passedtests/test_experiment_defaults.py— 3 passed./infra/pre-commit.py --all-files --fix— all checks pass🤖 Generated with Claude Code