[dpo] Add LoRA-DPO support to Levanter by ahmeda14960 · Pull Request #4637 · marin-community/marin

ahmeda14960 · 2026-04-10T20:54:47Z

Summary

Unifies regular DPO and LoRA-DPO under the same train_dpo.py entrypoint via config-driven adapter.type + reference.type
Adds shared LoRA adaptation layer (adaptation.py), model init helper (model_init.py), and durable reference eval cache (2x eval speedup)
Fixes HF export: LoRA merge axis-order bug + chat_template embedding in tokenizer_config.json
Updates DPO training doc with LoRA-DPO sections (B-matrix init, LR guidance, checkpoint saving)

Cleaned-up version of #4634 — removed logbooks, experiment sweep scripts, one-off configs, debug logs, and plots (125 files → 28 files).

Test plan

lib/levanter/tests/test_dpo.py — 37 passed
lib/levanter/tests/test_lora_dpo.py — 12 passed
lib/levanter/tests/test_lora.py — 7 passed
lib/levanter/tests/test_hf_export.py — 8 passed
tests/test_training.py — 9 passed
tests/test_experiment_defaults.py — 3 passed
./infra/pre-commit.py --all-files --fix — all checks pass

🤖 Generated with Claude Code

Unify regular DPO and LoRA-DPO under the same train_dpo.py entrypoint with config-driven adapter/reference modes. LoRA-DPO uses the adapter base model as the implicit frozen reference, halving model memory. Key changes: - Shared adaptation layer (levanter/adaptation.py) for LoRA application, trainable-param filtering, and HF export hooks - Shared model/bootstrap helper (levanter/main/model_init.py) - Durable reference eval cache (2x eval speedup by caching frozen reference logprobs to GCS sidecar) - HF export fixes: LoRA merge axis-order bug, chat_template embedding - Legacy lora_dpo.py compatibility shim for older configs - DPO training doc updated with LoRA-DPO sections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Reverts pointless import shuffling in exp1600_uncheatable_evals.py and fixes wrong import path in models.py (marin.download → marin.datakit.download). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd51d5e2f7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-10T21:01:22Z

experiments/models.py

 from dataclasses import dataclass

-from marin.datakit.download.huggingface import DownloadConfig, download_hf
+from marin.download.huggingface.download_hf import DownloadConfig, download_hf


Use existing Hugging Face downloader module path

This import points to a module path that does not exist in this repository (marin.download...), so importing experiments.models now raises ModuleNotFoundError and breaks any workflow that imports these defaults (for example via experiments.defaults). The previous marin.datakit.download.huggingface path is present, so this change introduces an immediate runtime/import failure.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-10T21:01:22Z

lib/levanter/src/levanter/main/train_dpo.py

+        converter.save_pretrained(
+            _policy_model_for_hf_save(step.eval_model),
+            os.path.join(full_save_path, f"step-{step.step}"),
+            upload_to_hf=upload_to_hf,
+            dtype=save_dtype,
+            **hf_upload_kwargs,


Forward generation config in separate-reference HF export

The separate-reference export path no longer passes export.generation_config into converter.save_pretrained, so hf_generation_eos_token_ids is silently ignored for standard DPO exports (reference.type=separate, adapter.type=none). That produces checkpoints without the intended generation settings and can change downstream generation behavior.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-10T21:01:22Z

lib/marin/src/marin/training/training.py

+    dataset_size = _dpo_training_dataset_size(train_config)
+    logger.info("Resolved DPO train set size from tokenizer stats: %d examples", dataset_size)


Skip dataset-size resolution when only eval schedule is auto

This function always resolves dataset size from cache stats whenever either auto knob is set, even if only auto_validation_runs is requested. In the common case of explicit num_train_steps with auto validation scheduling, this introduces an unnecessary hard dependency on .stats.json/concrete cache paths and can fail before launch despite not needing dataset size to compute eval steps.

Useful? React with 👍 / 👎.

ahmeda14960 added the agent-generated Created by automation/agent label Apr 10, 2026

ahmeda14960 and others added 2 commits April 10, 2026 13:57

[dpo] Revert unrelated changes to uncheatable evals and models

541bd12

Reverts pointless import shuffling in exp1600_uncheatable_evals.py and fixes wrong import path in models.py (marin.download → marin.datakit.download). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

[dpo] Revert unrelated change to launch.py

bc41ad9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ahmeda14960 requested a review from dlwh April 10, 2026 20:59

chatgpt-codex-connector bot reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dpo] Add LoRA-DPO support to Levanter#4637

[dpo] Add LoRA-DPO support to Levanter#4637
ahmeda14960 wants to merge 3 commits intomainfrom
dpo-lora-clean

ahmeda14960 commented Apr 10, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		dataset_size = _dpo_training_dataset_size(train_config)
		logger.info("Resolved DPO train set size from tokenizer stats: %d examples", dataset_size)

Conversation

ahmeda14960 commented Apr 10, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant