Closed
Description
Describe the bug
I would like to train a llava model using RL.
after model loaded via :
model = LlavaLlamaForCausalLM.from_pretrained(...)
I also want to have another model called ref_model:
ref0_model = copy.deepcopy(model)
then i use trainer:
trainer = LLaVATrainer(model=model,
ref_model = ref_model,
rl_mode = True,
tokenizer=tokenizer,
args=training_args,
**data_module)
use trainer.train().
In trainer.train():
i need to get output from self.model(**batch), this is successful.
But i also need to get output_ref from self.ref0_model(**batch), this is unsuccessful.
Bug report: the dimension is incorrect when forward()...
However, we use same model as ref0_model deepcopy from model.
How do i solve this under stage 3?