[Auto Parallel] Add tensor_fusion and overlap in auto dy sharding #72551

Xing-lil · 2025-04-29T07:29:03Z

PR Category

Auto Parallel

PR Types

New features

Description

inplace_master_grad

In-place param.main_grad replaces the old master_grad in auto dy.
param.main_grad will use inplace add_ to save or cast grad to fp32 and store them in param.main_grad.
Enable by setting export Flags_enable_inplace_master_grad=1.

tensor_fusion

tensor_fusion groups params and grads into continuous param_storage and grad_storage.
grad_storage is used for grad's reduce_scatter comm.
param_storage is used for param's all_gather comm.
Supports non-uniform partitioning of params and grads across GPUs.
Each step requires get non-uniform params and grads from param_storage and grad_storage using view_slice.
Non-uniform grad_chip requires call all_reduce manually to collect global_norm_var.
Enable by setting export FLAGS_enable_tensor_fusion=1.

sharding_overlap

Overlap reduce_scatter comm for grads with grad computation in bwd.
Overlap all_gather comm for params with opt computation.
Enable by setting export FLAGS_enable_tensor_fusion=1.

Note: non-uniform tensor_fusion changes the order of add in grad_chip, introducing some loss diff.
Convergence results on llama7b, 1NC8, sharding8, 50,000 steps.

【TODO】Add strategy config in auto-dy, like hand-dy (feelt.init(strategy)) and auto-static (to_static(strategy)).

Pcard-70448

paddle-bot · 2025-04-29T07:29:09Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

liym27 · 2025-05-14T07:22:48Z

python/paddle/distributed/auto_parallel/api.py

+        self.enable_tensor_fusion = (
+            os.getenv("FLAGS_enable_tensor_fusion") == '1'
+        )
+        self.enable_sharding_overlap = (


sharding_overlap will not be set through FLAGS in the future and better to configure it with Strategy

Thx, will fix it later

liym27

LGTM

update

7ff6e02

Xing-lil added 3 commits April 29, 2025 15:53

Update api.py

d4b9a10

Update api.py

7deec98

Update semi_auto_parallel_sharding_stage_1.py

bc1bdea

Xing-lil requested a review from liym27 May 9, 2025 06:09

Xing-lil added 6 commits May 12, 2025 16:57

Update auto_cast.py

116bfe0

Update semi_auto_parallel_sharding_stage_1.py

1c781ac

del Flags_enable_inplace_master_grad

0ac78a9

Update api.py

597876a

Update api.py

f4a086d

Update clip.py

0501581

Xing-lil mentioned this pull request May 14, 2025

[Auto Parallel] add FLAGS_enable_tensor_fusion #72508

Open

liym27 reviewed May 14, 2025

View reviewed changes

Xing-lil added 2 commits May 14, 2025 19:03

update

ab68bda

update

c65c52a

liym27 approved these changes May 15, 2025

View reviewed changes

liym27 merged commit fce2670 into PaddlePaddle:develop May 15, 2025
49 of 50 checks passed

Xing-lil mentioned this pull request May 15, 2025

[Auto-Parallel] Add ci for tensor_fusion and overlap in auto-dy PaddlePaddle/PaddleNLP#10598

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Auto Parallel] Add tensor_fusion and overlap in auto dy sharding #72551

[Auto Parallel] Add tensor_fusion and overlap in auto dy sharding #72551

Uh oh!

Xing-lil commented Apr 29, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Apr 29, 2025

Uh oh!

liym27 May 14, 2025

Uh oh!

Xing-lil May 14, 2025

Uh oh!

liym27 left a comment

Uh oh!

Uh oh!

Uh oh!

[Auto Parallel] Add tensor_fusion and overlap in auto dy sharding #72551

[Auto Parallel] Add tensor_fusion and overlap in auto dy sharding #72551

Uh oh!

Conversation

Xing-lil commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented Apr 29, 2025

Uh oh!

liym27 May 14, 2025

Choose a reason for hiding this comment

Uh oh!

Xing-lil May 14, 2025

Choose a reason for hiding this comment

Uh oh!

liym27 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Xing-lil commented Apr 29, 2025 •

edited

Loading