Skip to content

Step2 training get a negative score and accuray is below 60% #322

Open
@dlnlpchenliyu

Description

@dlnlpchenliyu

Hi~
While running step2 reward model training, I got a strange result after one epoch training:
***** Evaluating reward, Epoch 1/1 *****
chosen_last_scores (higher is better) : -9.388486862182617, acc (higher is better) : 0.5991161465644836.

I wonder what's wrong with my training script?

my training script is as below:

OUTPUT=$1
ZERO_STAGE=$2
if [ "$OUTPUT" == "" ]; then
OUTPUT=./output
fi
if [ "$ZERO_STAGE" == "" ]; then
ZERO_STAGE=0
fi
mkdir -p $OUTPUT

export CUDA_VISIBLE_DEVICES=1
deepspeed --master_port 29501 --include localhost:1 main.py --model_name_or_path facebook/opt-350m
--data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets openai/webgpt_comparisons stanfordnlp/SHP
--num_padding_at_beginning 1 --gradient_accumulation_steps 2 --zero_stage $ZERO_STAGE
--per_device_train_batch_size 8 --per_device_eval_batch_size 16 --num_train_epochs 1
--deepspeed --output_dir $OUTPUT &> $OUTPUT/training.log

The training process is running on a 32G V100-Tesla GPU

Metadata

Metadata

Assignees

Labels

deespeed chatDeepSpeed ChatmodelingRelated to modeling questions.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions