The reward in step3 seems to be completely random without any noticeable increase.

I am testing the 1.3B training. Steps 1 and 2 have already passed, but there is no change in reward after completing step 3.

I used LoRa to train for one iteration, and the results of steps 1 and 2 are as follows:
step1:
ppl: 2.18959641456604 

step2：
![image](https://user-images.githubusercontent.com/128342390/236686244-ec2abeec-0ce5-4953-b97d-8184750b78d9.png)

Step3：
![image](https://user-images.githubusercontent.com/128342390/236686337-a5c98898-d91d-4d16-aea9-9cad20405b69.png)

I let chatgpt extracting the logs for step 3 and comparing them with the demo logs provided in the project. I found that the absolute value of my loss is significantly smaller, and the reward seems to be completely random without any noticeable increase. (stand)

![image](https://user-images.githubusercontent.com/128342390/236686869-d077d29f-3c5c-42b7-855d-ae8cea6f6b34.png)



![image](https://user-images.githubusercontent.com/128342390/236686727-b055608d-fd04-4e76-959f-fa5076238eae.png)
![image](https://user-images.githubusercontent.com/128342390/236686735-8cff0955-8ca3-480e-b7d1-b59c5d69aca0.png)
![image](https://user-images.githubusercontent.com/128342390/236686738-b7899786-3699-4f67-8d46-51dbbccb0e7f.png)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The reward in step3 seems to be completely random without any noticeable increase. #489

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The reward in step3 seems to be completely random without any noticeable increase. #489

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions