Bug: Numerically unstable loss at reward model

Hi! I have got an infinite loss when trained critic model at step 2:
`Epoch 1/1 with loss inf`
I've found a source of this problem: reward model loss is calculated with unstable formula:
https://github.com/microsoft/DeepSpeedExamples/blob/ab4e2e54620d0e80ead128b30dd39d9d55751eab/applications/DeepSpeed-Chat/training/utils/model/reward_model.py#L102
I propose to replace it with this expression:

`loss += nn.functional.softplus(
                r_truncated_reward - c_truncated_reward).mean()`

Mathematically `-log(sigmoid(x))` is equal to `softplus(-x)` but the second one is stable. Here are outputs of these functions respectivelly with fp32:
`-100.0: (inf,100.0)
-90.0: (inf,90.0)
-80.0: (80.0,80.0)
-70.0: (70.0,70.0)
-60.0: (60.0,60.0)
-50.0: (50.0,50.0)
-40.0: (40.0,40.0)
-30.0: (30.0,30.0)
-20.0: (20.0,20.0)
-10.0: (10.000045776367188,10.000045776367188)
0.0: (0.6931471824645996,0.6931471824645996)
10.0: (4.541977250482887e-05,4.5398901420412585e-05)
20.0: (-0.0,2.06115369216775e-09)
30.0: (-0.0,9.357622912219837e-14)
40.0: (-0.0,4.24835413113866e-18)
50.0: (-0.0,1.9287498933537385e-22)
60.0: (-0.0,8.75651089272076e-27)
70.0: (-0.0,3.975449954226706e-31)
80.0: (-0.0,1.8048513285848406e-35)
90.0: (-0.0,8.194008692231508e-40)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Numerically unstable loss at reward model #423

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: Numerically unstable loss at reward model #423

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions