Skip to content

[dpo] Add LoRA-DPO support to Levanter#4637

Open
ahmeda14960 wants to merge 3 commits intomainfrom
dpo-lora-clean
Open

[dpo] Add LoRA-DPO support to Levanter#4637
ahmeda14960 wants to merge 3 commits intomainfrom
dpo-lora-clean

Conversation

@ahmeda14960
Copy link
Copy Markdown
Contributor

Summary

  • Unifies regular DPO and LoRA-DPO under the same train_dpo.py entrypoint via config-driven adapter.type + reference.type
  • Adds shared LoRA adaptation layer (adaptation.py), model init helper (model_init.py), and durable reference eval cache (2x eval speedup)
  • Fixes HF export: LoRA merge axis-order bug + chat_template embedding in tokenizer_config.json
  • Updates DPO training doc with LoRA-DPO sections (B-matrix init, LR guidance, checkpoint saving)

Cleaned-up version of #4634 — removed logbooks, experiment sweep scripts, one-off configs, debug logs, and plots (125 files → 28 files).

Test plan

  • lib/levanter/tests/test_dpo.py — 37 passed
  • lib/levanter/tests/test_lora_dpo.py — 12 passed
  • lib/levanter/tests/test_lora.py — 7 passed
  • lib/levanter/tests/test_hf_export.py — 8 passed
  • tests/test_training.py — 9 passed
  • tests/test_experiment_defaults.py — 3 passed
  • ./infra/pre-commit.py --all-files --fix — all checks pass

🤖 Generated with Claude Code

Unify regular DPO and LoRA-DPO under the same train_dpo.py entrypoint
with config-driven adapter/reference modes. LoRA-DPO uses the adapter
base model as the implicit frozen reference, halving model memory.

Key changes:
- Shared adaptation layer (levanter/adaptation.py) for LoRA application,
  trainable-param filtering, and HF export hooks
- Shared model/bootstrap helper (levanter/main/model_init.py)
- Durable reference eval cache (2x eval speedup by caching frozen
  reference logprobs to GCS sidecar)
- HF export fixes: LoRA merge axis-order bug, chat_template embedding
- Legacy lora_dpo.py compatibility shim for older configs
- DPO training doc updated with LoRA-DPO sections

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ahmeda14960 ahmeda14960 added the agent-generated Created by automation/agent label Apr 10, 2026
ahmeda14960 and others added 2 commits April 10, 2026 13:57
Reverts pointless import shuffling in exp1600_uncheatable_evals.py and
fixes wrong import path in models.py (marin.download → marin.datakit.download).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ahmeda14960 ahmeda14960 requested a review from dlwh April 10, 2026 20:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd51d5e2f7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

from dataclasses import dataclass

from marin.datakit.download.huggingface import DownloadConfig, download_hf
from marin.download.huggingface.download_hf import DownloadConfig, download_hf
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use existing Hugging Face downloader module path

This import points to a module path that does not exist in this repository (marin.download...), so importing experiments.models now raises ModuleNotFoundError and breaks any workflow that imports these defaults (for example via experiments.defaults). The previous marin.datakit.download.huggingface path is present, so this change introduces an immediate runtime/import failure.

Useful? React with 👍 / 👎.

Comment on lines +536 to +541
converter.save_pretrained(
_policy_model_for_hf_save(step.eval_model),
os.path.join(full_save_path, f"step-{step.step}"),
upload_to_hf=upload_to_hf,
dtype=save_dtype,
**hf_upload_kwargs,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Forward generation config in separate-reference HF export

The separate-reference export path no longer passes export.generation_config into converter.save_pretrained, so hf_generation_eos_token_ids is silently ignored for standard DPO exports (reference.type=separate, adapter.type=none). That produces checkpoints without the intended generation settings and can change downstream generation behavior.

Useful? React with 👍 / 👎.

Comment on lines +231 to +232
dataset_size = _dpo_training_dataset_size(train_config)
logger.info("Resolved DPO train set size from tokenizer stats: %d examples", dataset_size)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip dataset-size resolution when only eval schedule is auto

This function always resolves dataset size from cache stats whenever either auto knob is set, even if only auto_validation_runs is requested. In the common case of explicit num_train_steps with auto validation scheduling, this introduces an unnecessary hard dependency on .stats.json/concrete cache paths and can fail before launch despite not needing dataset size to compute eval steps.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant