Description
Context:
Currently merging ckpt model + lora weights is the default in our recipes. We say that in our docs and assume it for generation. Our core users are used to it.
Problem:
-
IMO, this is a bad default. It is hidden from the user and can cause issues.
-
These issues are:
-
2.1: The user doesn't know that the weights are merged and deploy the merged trained model AND the adapter. This will silently have bad results;
-
2.2. The user trains a model that combines lora + full finetune (e.g. embeddings, llama vision), saves the merged weights, and now they can NOT resume from training, because, for example, there isnt a ckpt that has untrained transformers and trained embeddings;
-
2.3: The fact that we save both the adapter + model gives the false impression that they should be combined;
Solution 1:
Remove the weight merging default from recipes. If users want to have it, they can explicitly set a flag or call an utility.
Pros:
- No risk of deploying merged + lora
- No hidden behaviors
- Resuming from ckpt is straight forward
Cons:
- It changes our default behavior
- generation and docs have to be updated
- the user needs to run an extra step if they want it merged (unless we add a flag)
Solution 2:
Check if any layer, other than lora, requires grad. If they do, set merge = False.
Pros:
- Doesn't change default behavior for regular lora
- Can resume from ckpt if we add extra logic/safeguards
Cons:
- introduce another hidden behavior: sometimes its merged, sometimes it isnt.
- The user cant know or control if the model weights are merged or not merged
- bloated if/else code
Activity