【problem discuss】Critic Loss can not decrease

Here are my situation:
1. finished step 2 with cohere/zhihu_query dataset. The final reward score is  5.07, rejected score is 0.8, and the acc is 0.79.  So the step 2 seems sucessful.
2. when I attempt to step 3.   I met loss scale maximum problem which solved by  change the learning rate(actor & critic).  Then I met a problem, The Critic loss can not decrease. In many experiments, It changed from 4 to 7 or  stay with 5.

here are my problems:
1. I tried to test the model(actor). I found the actor model's performence is better than the sft model. Is it normal？ 
2. The actor loss = - advantage * clip(ratio). I obtain the actor loss in my log, it  changed from -0.1 to -2.  So the clip(ratio) is around 0.8-1.2,  This means the advantage is bigger than 0 and inscreased during training. Advantage means the action take by the actor model  is berrter or bad than average（baseline), so bigger advantage is better and smaller actor loss is better( since the advantage bigger, the actor loss is smaller)?  


looking forward to your reply
thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【problem discuss】Critic Loss can not decrease #556

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

【problem discuss】Critic Loss can not decrease #556

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions