Skip to content

DPO after / on top of LoRA tuning #2272

Open
@albertbn

Description

I am trying to figure out how to run DPO on Llama 3.1 405B, already tuned with LoRA with save_adapter_weights_only: True. Tuning the 405B with LoRA works great, saving only the adapter, which I then use in VLLM on top of the original meta weights (without any weight-adapter merging whatsoever).

So now I just need a little preference alignment. I tried several options, including continuing from a checkpoint. The thing is that for adapters only, I don't have a recipe checkpoint to proceed from, and this most probably is wrong, since DPO would continue from another recipe?

I looked a little into the recipes code, before I start digging in, I decided to post my question here,

thanks in advance

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

discussionStart a discussiontriagedThis issue has been assigned an owner and appropriate label

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions