DPO after / on top of LoRA tuning

I am trying to figure out how to **run DPO on Llama 3.1 405B**, already **tuned with LoRA** with `save_adapter_weights_only: True`. Tuning the 405B with LoRA works great, saving only the adapter, which I then use in VLLM on top of the original meta weights (without any weight-adapter merging whatsoever). 

So now I just need a little preference alignment. I tried several options, including continuing from a checkpoint. The thing is that for adapters only, I don't have a recipe checkpoint to proceed from, and this most probably is wrong, since DPO would continue from another recipe?

I looked a little into the recipes code, before I start digging in, I decided to post my question here, 

thanks in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPO after / on top of LoRA tuning #2272

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DPO after / on top of LoRA tuning #2272

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions