[QUESTION]why f and g must conjucates each other?

**Your question**

According to my understanding, there are two facts:
1. the grad op of communication op is still communication(e.g. allreduce's grad is still allreduce, allgather's grad is reduce-scatter)
2. without considering the gradient allreduce in data parallel, if no communication op exists in forward, neither nor in backward.

So, why not use a single function g, whose both forward and backward operation are allreduce instead the two conjucation functions;


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION]why f and g must conjucates each other? #726

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QUESTION]why f and g must conjucates each other? #726

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions