Skip to content

【problem discuss】Critic Loss can not decrease #556

Open
@watermelon-lee

Description

@watermelon-lee

Here are my situation:

  1. finished step 2 with cohere/zhihu_query dataset. The final reward score is 5.07, rejected score is 0.8, and the acc is 0.79. So the step 2 seems sucessful.
  2. when I attempt to step 3. I met loss scale maximum problem which solved by change the learning rate(actor & critic). Then I met a problem, The Critic loss can not decrease. In many experiments, It changed from 4 to 7 or stay with 5.

here are my problems:

  1. I tried to test the model(actor). I found the actor model's performence is better than the sft model. Is it normal?
  2. The actor loss = - advantage * clip(ratio). I obtain the actor loss in my log, it changed from -0.1 to -2. So the clip(ratio) is around 0.8-1.2, This means the advantage is bigger than 0 and inscreased during training. Advantage means the action take by the actor model is berrter or bad than average(baseline), so bigger advantage is better and smaller actor loss is better( since the advantage bigger, the actor loss is smaller)?

looking forward to your reply
thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    deespeed chatDeepSpeed ChatmodelingRelated to modeling questions.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions