Skip to content

[QUESTION]why f and g must conjucates each other? #726

@bescks

Description

@bescks

Your question

According to my understanding, there are two facts:

  1. the grad op of communication op is still communication(e.g. allreduce's grad is still allreduce, allgather's grad is reduce-scatter)
  2. without considering the gradient allreduce in data parallel, if no communication op exists in forward, neither nor in backward.

So, why not use a single function g, whose both forward and backward operation are allreduce instead the two conjucation functions;

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleNo activity in 60 days on issue or PR

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions