Adding two loss from actor will lead to an error " gradient computed twice for this partition"  

When training the ppo model, I turned on the gradient_checkpointing_enable. If you want to calculate ptx loss, then actor will forward twice. In your code, these two loss are executed backward once separately, which will not be any problem. However, if I add these two loss and then use the engine's backward, then the error "gradient computed twice for this partition" will appear. If I don't use the option of gradient_checkpointing_enable, this error will not occur. This error seems to only appear in the zero mode of deepspeed, and I don't know why.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding two loss from actor will lead to an error " gradient computed twice for this partition" #458

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Adding two loss from actor will lead to an error " gradient computed twice for this partition" #458

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions