Skip to content

In Step3, RuntimeError:RewardModel:size mismatch for rwtranrsformer.decoder.embed_tokens.weight #512

Open
@KyrieXu11

Description

@KyrieXu11
  1. In SFT step, the model I used is llama-7b-hf download from hugging face, and all datasets are default. here is my launch shell:
deepspeed  --hostfile=$hosts_file train_sft.py \
   --data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets \
   --data_split 2,4,4 \
   --model_name_or_path $path_to'llama-7b-hf/' \
   --per_device_train_batch_size 4 \
   --per_device_eval_batch_size 4 \
   --max_seq_len 512 \
   --learning_rate 1e-4 \
   --weight_decay 0.1 \
   --num_train_epochs 2  \
   --gradient_accumulation_steps 1 \
   --lr_scheduler_type cosine \
   --num_warmup_steps 0 \
   --seed 1234 \
   --gradient_checkpointing \
   --zero_stage $ZERO_STAGE \
   --lora_dim 128 \
   --lora_module_name layers. \
   --deepspeed \
   --output_dir $OUTPUT \
   &> $OUTPUT/training.log
  1. In Reward step,I use OPT-350m and my launch shell is default:
deepspeed --hostfile=$hostfile main.py \
   --data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets \
   --data_split 2,4,4 \
   --model_name_or_path facebook/opt-350m \
   --num_padding_at_beginning 1 \
   --per_device_train_batch_size 2 \
   --per_device_eval_batch_size 2 \
   --max_seq_len 512 \
   --learning_rate 5e-5 \
   --weight_decay 0.1 \
   --disable_dropout \
   --num_train_epochs 1 \
   --gradient_accumulation_steps 1 \
   --lr_scheduler_type cosine \
   --num_warmup_steps 0 \
   --seed 1234 \
   --zero_stage $ZERO_STAGE \
   --deepspeed \
   --output_dir $OUTPUT \
   &> $OUTPUT/training.log
  1. In last PPO step, I set SFT result model to ACTOR_MODEL_PATH , and reward model to CRITIC_MODEL_PATH, In launch shell, I disabled hybrid_engine, here is my launch shell:
ACTOR_MODEL_PATH=$path_to'/step1_supervised_finetuning/output/'
CRITIC_MODEL_PATH=$path_to'/step2_reward_model_finetuning/output/'
ACTOR_ZERO_STAGE=$3
CRITIC_ZERO_STAGE=$4
OUTPUT=$5
if [ "$OUTPUT" == "" ]; then
    OUTPUT=./output
fi
if [ "$ACTOR_ZERO_STAGE" == "" ]; then
    ACTOR_ZERO_STAGE=3
fi
if [ "$CRITIC_ZERO_STAGE" == "" ]; then
    CRITIC_ZERO_STAGE=3
fi
mkdir -p $OUTPUT

Num_Padding_at_Beginning=1 # this is model related

Actor_Lr=5e-4
Critic_Lr=5e-6

hostfile='hostfile'

deepspeed --hostfile=$hostfile main.py \
   --data_path Dahoas/rm-static \
   --data_split 2,4,4 \
   --actor_model_name_or_path $ACTOR_MODEL_PATH \
   --critic_model_name_or_path $CRITIC_MODEL_PATH \
   --num_padding_at_beginning 1 \
   --per_device_train_batch_size 4 \
   --per_device_mini_train_batch_size 4 \
   --generation_batch_numbers 1 \
   --ppo_epochs 1 \
   --max_answer_seq_len 256 \
   --max_prompt_seq_len 256 \
   --actor_learning_rate ${Actor_Lr} \
   --critic_learning_rate ${Critic_Lr} \
   --actor_weight_decay 0.1 \
   --critic_weight_decay 0.1 \
   --num_train_epochs 1 \
   --lr_scheduler_type cosine \
   --gradient_accumulation_steps 1 \
   --num_warmup_steps 100 \
   --deepspeed --seed 1234 \
   --inference_tp_size 8 \
   --tp_gather_partition_size 4 \
   --actor_zero_stage $ACTOR_ZERO_STAGE \
   --critic_zero_stage $CRITIC_ZERO_STAGE \
   --actor_gradient_checkpointing \
   --disable_actor_dropout \
   --actor_lora_dim 128 \
   --actor_lora_module_name layers. \
   --output_dir $OUTPUT \
    &> $OUTPUT/training.log

In Step 1&2, everything works well.but in step 3,there is an error:

RuntimeError: Error(s) in loading state_dict for RewardModel size mismatch for rwtranrsformer.decoder.embed_tokens.weight: copying a param with shape torch.Size([50272, 512]) from checkpoint, the shape in current model is torch.Size([32000, 512]).

I find a issue: #358 about this, but after I pulled latest master branch, it also doesn't work.

Anyone met this before?

Metadata

Metadata

Labels

deespeed chatDeepSpeed ChatllamaQuestions related to llama model

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions