Skip to content

Bug: Numerically unstable loss at reward model #423

Closed
@s-isaev

Description

@s-isaev

Hi! I have got an infinite loss when trained critic model at step 2:
Epoch 1/1 with loss inf
I've found a source of this problem: reward model loss is calculated with unstable formula:
https://github.com/microsoft/DeepSpeedExamples/blob/ab4e2e54620d0e80ead128b30dd39d9d55751eab/applications/DeepSpeed-Chat/training/utils/model/reward_model.py#L102
I propose to replace it with this expression:

loss += nn.functional.softplus( r_truncated_reward - c_truncated_reward).mean()

Mathematically -log(sigmoid(x)) is equal to softplus(-x) but the second one is stable. Here are outputs of these functions respectivelly with fp32:
-100.0: (inf,100.0) -90.0: (inf,90.0) -80.0: (80.0,80.0) -70.0: (70.0,70.0) -60.0: (60.0,60.0) -50.0: (50.0,50.0) -40.0: (40.0,40.0) -30.0: (30.0,30.0) -20.0: (20.0,20.0) -10.0: (10.000045776367188,10.000045776367188) 0.0: (0.6931471824645996,0.6931471824645996) 10.0: (4.541977250482887e-05,4.5398901420412585e-05) 20.0: (-0.0,2.06115369216775e-09) 30.0: (-0.0,9.357622912219837e-14) 40.0: (-0.0,4.24835413113866e-18) 50.0: (-0.0,1.9287498933537385e-22) 60.0: (-0.0,8.75651089272076e-27) 70.0: (-0.0,3.975449954226706e-31) 80.0: (-0.0,1.8048513285848406e-35) 90.0: (-0.0,8.194008692231508e-40)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions