Skip to content

The reward in step3 seems to be completely random without any noticeable increase. #489

Open
@laoda513

Description

@laoda513

I am testing the 1.3B training. Steps 1 and 2 have already passed, but there is no change in reward after completing step 3.

I used LoRa to train for one iteration, and the results of steps 1 and 2 are as follows:
step1:
ppl: 2.18959641456604

step2:
image

Step3:
image

I let chatgpt extracting the logs for step 3 and comparing them with the demo logs provided in the project. I found that the absolute value of my loss is significantly smaller, and the reward seems to be completely random without any noticeable increase. (stand)

image

image
image
image

Metadata

Metadata

Assignees

Labels

deespeed chatDeepSpeed ChatmodelingRelated to modeling questions.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions