Open
Description
I am trying to figure out how to run DPO on Llama 3.1 405B, already tuned with LoRA with save_adapter_weights_only: True
. Tuning the 405B with LoRA works great, saving only the adapter, which I then use in VLLM on top of the original meta weights (without any weight-adapter merging whatsoever).
So now I just need a little preference alignment. I tried several options, including continuing from a checkpoint. The thing is that for adapters only, I don't have a recipe checkpoint to proceed from, and this most probably is wrong, since DPO would continue from another recipe?
I looked a little into the recipes code, before I start digging in, I decided to post my question here,
thanks in advance