Skip to content

Adding two loss from actor will lead to an error " gradient computed twice for this partition"  #458

Open
@piekey1994

Description

@piekey1994

When training the ppo model, I turned on the gradient_checkpointing_enable. If you want to calculate ptx loss, then actor will forward twice. In your code, these two loss are executed backward once separately, which will not be any problem. However, if I add these two loss and then use the engine's backward, then the error "gradient computed twice for this partition" will appear. If I don't use the option of gradient_checkpointing_enable, this error will not occur. This error seems to only appear in the zero mode of deepspeed, and I don't know why.

Metadata

Metadata

Assignees

Labels

deespeed chatDeepSpeed ChatmodelingRelated to modeling questions.new-configA modified config from the given example

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions