Open
Description
In step 3, I modified the zero-stage of the critic model and reward model from 0 to 3, and the Reward Model trained with zero-stage 3 in step 2 cannot be loaded。But we can load reward model with zero-stage 0 in rw_eval.py
err:size mismatch for rwtranrsformer.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([0]).
modified the zero-stage of the critic model and reward model from 0 to 3: