Skip to content

[QUESTION] Why we need modify the LinearWithGradAccumulationAndAsyncCommunication's backward function to support SP? #59

@b4b4o

Description

@b4b4o

Thanks to authors for the great work!

As shown in this commit a84d634, It just seems to add a flag reshard_for_sequence_parallel for the Row Major Linear, The forward RS and backward AG of output_ = reduce_scatter_to_sequence_parallel_region(output_parallel) have been repackaged into backward().

I'm not sure why the change of this commit would support SP, or why it wouldn't support SP if without the change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions