Add main_grad by jianyuh · Pull Request #1140 · facebookresearch/fairscale

jianyuh · 2023-10-02T01:08:35Z

What does this PR do?

Fixes main_grad following up #1139 (comment)

Before submitting

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

vedanuj · 2023-10-02T20:05:24Z

fairscale/nn/data_parallel/fully_sharded_data_parallel.py


        # Switch to FP32 shard after backward.
        self._use_fp32_param_shard([param])
+        if self.mixed_precision and self.fp32_reduce_scatter:


Currently for fp8, we do not use mixed_precision, so we should remove this.

Only check

if self.fp32_reduce_scatter:

Addressed the comment.

jianyuh · 2023-10-04T05:10:34Z

fairscale/nn/data_parallel/fully_sharded_data_parallel.py

                # Cast grad to FP32.
                param.grad.data = param.grad.data.float()

+            orig_grad_data = param.grad.data


Move here to make orig_grad_data FP32. This was from #1139 (comment)

jspark1105 · 2023-10-04T05:32:30Z

fairscale/nn/data_parallel/fully_sharded_data_parallel.py


            if self.fp32_reduce_scatter:
                # Cast grad to FP32.
                param.grad.data = param.grad.data.float()


I don't feel this is right since param.grad will be None from L1722.

Overall, this PR creates main_grad for flat parameters while what we need to do is main_grad visible to TE modules. So probably we need to change FlatParameter as well?

Is this based on one of Naman's branches?

I have a branch where i am adding param.main_grad to FlatParams to enable fuse wgrad accumulation. here is the PR : #1142

Thanks! Feel free to ignore the changes in this PR. Still learning about FlatParams etc.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 2, 2023

jianyuh requested review from jiecaoyu, jspark1105 and vedanuj October 2, 2023 01:08

jianyuh marked this pull request as ready for review October 2, 2023 01:09

jianyuh mentioned this pull request Oct 2, 2023

Fix fsdp+pp+te WPS decreasing issue #1139

Merged

10 tasks

jianyuh changed the base branch from ngoyal_changes_for_pp_fp8_fix_handle to ngoyal_changes_for_pp_fp8 October 2, 2023 03:03

jianyuh added 3 commits October 1, 2023 20:09

Fix fsdp+pp+te WPS decreasing issue

81ee78d

Address comment; remove unused stuff

71495ba

split into wps fix P841842878 only and main_grad fix

f3ae46e

jianyuh force-pushed the ngoyal_changes_for_pp_fp8_fix_handle_grad_main branch from 3f34441 to 239ed36 Compare October 2, 2023 03:11

vedanuj reviewed Oct 2, 2023

View reviewed changes

Add main_grad

ad54660

jianyuh force-pushed the ngoyal_changes_for_pp_fp8_fix_handle_grad_main branch from 239ed36 to ad54660 Compare October 2, 2023 21:48

jianyuh commented Oct 4, 2023

View reviewed changes

jspark1105 suggested changes Oct 4, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add main_grad#1140

Add main_grad#1140
jianyuh wants to merge 4 commits intongoyal_changes_for_pp_fp8from
ngoyal_changes_for_pp_fp8_fix_handle_grad_main

jianyuh commented Oct 2, 2023

Uh oh!

vedanuj Oct 2, 2023

Uh oh!

jianyuh Oct 2, 2023

Uh oh!

jianyuh Oct 4, 2023

Uh oh!

jspark1105 Oct 4, 2023

Uh oh!

vedanuj Oct 4, 2023

Uh oh!

jianyuh Oct 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jianyuh commented Oct 2, 2023

What does this PR do?

Before submitting

PR review

Uh oh!

vedanuj Oct 2, 2023

Choose a reason for hiding this comment

Uh oh!

jianyuh Oct 2, 2023

Choose a reason for hiding this comment

Uh oh!

jianyuh Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

jspark1105 Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

vedanuj Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

jianyuh Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants