Skip to content

safe_get_full_grad & safe_set_full_grad #7117

Open
@ProjectDisR

Description

@ProjectDisR

deepspeed 0.15.3
zero 3 is used

For "safe_get_full_grad", does it return the same gradient values on each process/rank?

As for "safe_set_full_grad", should it be called on all the processes/ranks? or just one of them is enough?
If it's the former one, users will need to ensure gradient values to be set on each process/rank are the same?

Also, which float type should be used for "safe_set_full_grad"? any way to check this?

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions