Skip to content

step2 bug fix for loss = nan when using BLOOM(which is left padding style) #571

Open
@scarydemon2

Description

@scarydemon2

When we use the bloom model to train the reward model, there may be a situation where the loss is always NaN. This is because the "end_ind" in the reward model is not correctly calculated, causing the "divergence_ind" to always be greater than "end_ind", so it is impossible to obtain the corresponding "chosen_reward" and "rejected_reward".

However, we know that since bloom uses left padding, the end position of "chosen" or "rejected" is always at the last index of the id. Therefore, we can simply set "end_ind = seq_len" in the forward function of the reward model, so that the end position can always be correctly obtained.

Metadata

Metadata

Assignees

Labels

deespeed chatDeepSpeed ChatmodelingRelated to modeling questions.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions