In Step3, RuntimeError:RewardModel:size mismatch for rwtranrsformer.decoder.embed_tokens.weight

1. In SFT step, the model I used is [llama-7b-hf](https://huggingface.co/decapoda-research/llama-7b-hf) download from hugging face, and all datasets are default. here is my launch shell:

```shell
deepspeed  --hostfile=$hosts_file train_sft.py \
   --data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets \
   --data_split 2,4,4 \
   --model_name_or_path $path_to'llama-7b-hf/' \
   --per_device_train_batch_size 4 \
   --per_device_eval_batch_size 4 \
   --max_seq_len 512 \
   --learning_rate 1e-4 \
   --weight_decay 0.1 \
   --num_train_epochs 2  \
   --gradient_accumulation_steps 1 \
   --lr_scheduler_type cosine \
   --num_warmup_steps 0 \
   --seed 1234 \
   --gradient_checkpointing \
   --zero_stage $ZERO_STAGE \
   --lora_dim 128 \
   --lora_module_name layers. \
   --deepspeed \
   --output_dir $OUTPUT \
   &> $OUTPUT/training.log
```

2. In Reward step,I use OPT-350m and my launch shell is default:

```shell
deepspeed --hostfile=$hostfile main.py \
   --data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets \
   --data_split 2,4,4 \
   --model_name_or_path facebook/opt-350m \
   --num_padding_at_beginning 1 \
   --per_device_train_batch_size 2 \
   --per_device_eval_batch_size 2 \
   --max_seq_len 512 \
   --learning_rate 5e-5 \
   --weight_decay 0.1 \
   --disable_dropout \
   --num_train_epochs 1 \
   --gradient_accumulation_steps 1 \
   --lr_scheduler_type cosine \
   --num_warmup_steps 0 \
   --seed 1234 \
   --zero_stage $ZERO_STAGE \
   --deepspeed \
   --output_dir $OUTPUT \
   &> $OUTPUT/training.log

```

3. In last PPO step, I set SFT result model to `ACTOR_MODEL_PATH` , and reward model to `CRITIC_MODEL_PATH`, In launch shell, I disabled hybrid_engine,  here is my launch shell:

```shell
ACTOR_MODEL_PATH=$path_to'/step1_supervised_finetuning/output/'
CRITIC_MODEL_PATH=$path_to'/step2_reward_model_finetuning/output/'
ACTOR_ZERO_STAGE=$3
CRITIC_ZERO_STAGE=$4
OUTPUT=$5
if [ "$OUTPUT" == "" ]; then
    OUTPUT=./output
fi
if [ "$ACTOR_ZERO_STAGE" == "" ]; then
    ACTOR_ZERO_STAGE=3
fi
if [ "$CRITIC_ZERO_STAGE" == "" ]; then
    CRITIC_ZERO_STAGE=3
fi
mkdir -p $OUTPUT

Num_Padding_at_Beginning=1 # this is model related

Actor_Lr=5e-4
Critic_Lr=5e-6

hostfile='hostfile'

deepspeed --hostfile=$hostfile main.py \
   --data_path Dahoas/rm-static \
   --data_split 2,4,4 \
   --actor_model_name_or_path $ACTOR_MODEL_PATH \
   --critic_model_name_or_path $CRITIC_MODEL_PATH \
   --num_padding_at_beginning 1 \
   --per_device_train_batch_size 4 \
   --per_device_mini_train_batch_size 4 \
   --generation_batch_numbers 1 \
   --ppo_epochs 1 \
   --max_answer_seq_len 256 \
   --max_prompt_seq_len 256 \
   --actor_learning_rate ${Actor_Lr} \
   --critic_learning_rate ${Critic_Lr} \
   --actor_weight_decay 0.1 \
   --critic_weight_decay 0.1 \
   --num_train_epochs 1 \
   --lr_scheduler_type cosine \
   --gradient_accumulation_steps 1 \
   --num_warmup_steps 100 \
   --deepspeed --seed 1234 \
   --inference_tp_size 8 \
   --tp_gather_partition_size 4 \
   --actor_zero_stage $ACTOR_ZERO_STAGE \
   --critic_zero_stage $CRITIC_ZERO_STAGE \
   --actor_gradient_checkpointing \
   --disable_actor_dropout \
   --actor_lora_dim 128 \
   --actor_lora_module_name layers. \
   --output_dir $OUTPUT \
    &> $OUTPUT/training.log
```

In Step 1&2, everything works well.but in step 3,there is an error:

RuntimeError: Error(s) in loading state_dict for RewardModel size mismatch for rwtranrsformer.decoder.embed_tokens.weight: copying a param with shape torch.Size([50272, 512]) from checkpoint, the shape in current model is torch.Size([32000, 512]).


I find a issue: https://github.com/microsoft/DeepSpeedExamples/issues/358 about this, but after I pulled latest master branch, it also doesn't work.

Anyone met this before?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

In Step3, RuntimeError:RewardModel:size mismatch for rwtranrsformer.decoder.embed_tokens.weight #512

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

In Step3, RuntimeError:RewardModel:size mismatch for rwtranrsformer.decoder.embed_tokens.weight #512

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions