Skip to content

[Auto Parallel] Add tensor_fusion and overlap in auto dy sharding #72551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

Xing-lil
Copy link
Contributor

@Xing-lil Xing-lil commented Apr 29, 2025

PR Category

Auto Parallel

PR Types

New features

Description

  1. inplace_master_grad
  • In-place param.main_grad replaces the old master_grad in auto dy.
  • param.main_grad will use inplace add_ to save or cast grad to fp32 and store them in param.main_grad.
  • Enable by setting export Flags_enable_inplace_master_grad=1.
  1. tensor_fusion
  • tensor_fusion groups params and grads into continuous param_storage and grad_storage.
  • grad_storage is used for grad's reduce_scatter comm.
  • param_storage is used for param's all_gather comm.
  • Supports non-uniform partitioning of params and grads across GPUs.
  • Each step requires get non-uniform params and grads from param_storage and grad_storage using view_slice.
  • Non-uniform grad_chip requires call all_reduce manually to collect global_norm_var.
  • Enable by setting export FLAGS_enable_tensor_fusion=1.
  1. sharding_overlap
  • Overlap reduce_scatter comm for grads with grad computation in bwd.
  • Overlap all_gather comm for params with opt computation.
  • Enable by setting export FLAGS_enable_tensor_fusion=1.

Note: non-uniform tensor_fusion changes the order of add in grad_chip, introducing some loss diff.
Convergence results on llama7b, 1NC8, sharding8, 50,000 steps.

loss_conver_50000
loss_diff_50000

Pcard-70448

Copy link

paddle-bot bot commented Apr 29, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant