-
Notifications
You must be signed in to change notification settings - Fork 6k
[DeepEP] support M2N #75582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
[DeepEP] support M2N #75582
Changes from all commits
Commits
Show all changes
129 commits
Select commit
Hold shift + click to select a range
c91052c
[Inference All2All] Supports internode_ll_two_stage all2all communica…
carryyu 6bd8a9e
[Inference All2All] Modify kMaxNumQPs in internodel_ll_two_stage
carryyu 7ebb4f5
[Inference All2All] Modify code-style
carryyu 85ad4db
[Inference All2All] Modify code-style
carryyu f487c0f
[Inference All2All] fix unit test
carryyu 630b95c
[Inference All2All] modify codestyle and enhance unit test
carryyu 444bb8e
[Inference All2All] modify codestyle and enhance unit test
carryyu 7bf5f18
[Inference All2All] supports batch_send and enhance unit test
carryyu 9f555fc
lzy test
szluyu99 61a49aa
add return_recv_hook
szluyu99 5eeb2f8
fixed num_sums
szluyu99 0e44ef6
Merge pull request #1 from l1351868270/m2n_return_recv_hook
l1351868270 a534b07
add m2n support
l1351868270 12486b4
add m2n test
l1351868270 7588e44
update test
szluyu99 25b4890
add support m2n buffer
l1351868270 e8475fe
add support m2n buffer
l1351868270 fb8c463
Merge branch 'm2n' of https://github.com/l1351868270/Paddle into m2n
l1351868270 9dec8d9
update build environment
l1351868270 93187ac
update build environment
szluyu99 20ab024
Merge branch 'm2n' of https://github.com/l1351868270/Paddle into m2n
l1351868270 1a461cf
m2n test add data check
szluyu99 bb91e5a
add m2n demo
l1351868270 dad1bec
Merge pull request #2 from l1351868270/m2n_lsl
l1351868270 3735ebd
pull the latest code
szluyu99 cba4253
test update
89ea91a
update test
szluyu99 b5d5743
update test
szluyu99 f00ca31
update m2n return_recv_hook
szluyu99 7035ac3
update e2a_irecv
l1351868270 8129b58
Merge pull request #4 from l1351868270/m2n_lsl
l1351868270 cac91e4
support start_port
l1351868270 63c24d7
Merge pull request #5 from l1351868270/m2n_lsl
l1351868270 07e1bed
fix 64 experts bug
szluyu99 9d1bb80
update m2n demo
l1351868270 4910170
set sm_count 24
szluyu99 36947ea
Merge pull request #6 from l1351868270/m2n_lsl_0722
l1351868270 7515f88
add async_finish support
l1351868270 7a64f4c
add send recv support
l1351868270 8bf9752
update test m2n demo
l1351868270 efc89c7
update test m2n demo
l1351868270 d3f3e70
two dispatch hang
l1351868270 511f309
solve dd dc cd bugs
l1351868270 cb309ac
m2n code independence from all2all code
l1351868270 465c618
support hook mode on communication stream
l1351868270 4028a55
Merge pull request #11 from l1351868270/m2n_dev_hook_comm_lsl
l1351868270 267ce49
add two dispatch test
l1351868270 9afd321
Merge pull request #12 from l1351868270/m2n_dev_hook_comm_lsl
l1351868270 958c358
add two batch size and two layer test
l1351868270 5daf9fc
Merge pull request #13 from l1351868270/m2n_dev_hook_comm_lsl
l1351868270 3eabb9c
add dispatch combine test
l1351868270 d18d3e5
Merge pull request #15 from l1351868270/m2n_dev_hook_comm_lsl
l1351868270 8086a3e
fix illegale memory
ec377cc
add v3 api
l1351868270 f79b7d2
Merge pull request #17 from l1351868270/m2n_dev_lsl_0804
l1351868270 c3cf0e8
add dispatch and combine wait
l1351868270 e588b10
Merge pull request #18 from l1351868270/m2n_dev_wait_lsl_0805
l1351868270 1f60f1f
fix continuous dispatch wrong https://github.com/PaddlePaddle/Paddle/…
l1351868270 b2dd8c2
Merge pull request #19 from l1351868270/m2n_dev_wait_lsl_0805
l1351868270 ea774de
Revert "fix continuous dispatch wrong"
l1351868270 566ce13
Merge pull request #20 from l1351868270/revert-19-m2n_dev_wait_lsl_0805
l1351868270 8eaca08
fix continuous dispatch wrong
l1351868270 3bec1c2
Merge pull request #21 from l1351868270/m2n_dev_wait_lsl_0805
l1351868270 cf7ee95
support wait all rank complete
l1351868270 7a570b2
Merge pull request #22 from l1351868270/m2n_dev_wait_lsl_0805
l1351868270 bc2f1c0
change test file
259e65c
change test file
0302f6d
fix receive
c1c6ad7
all layer simulate
021c649
update all layers
l1351868270 7590c6a
update all layers
l1351868270 703c8c5
update all layers
l1351868270 707fa1e
update all layers
l1351868270 3f12339
update all layers
l1351868270 179fb2d
update all layers
l1351868270 e26c98d
update all layers
l1351868270 b883549
update log
l1351868270 ad01f3e
update log
l1351868270 6c0a2a5
fix 51 layer hang
l1351868270 2e8937a
fix 51 layer hang
l1351868270 c49001b
fix dispatch 51 layer hang
l1351868270 d9be6a7
add hang log
l1351868270 6085562
add nvl sync log
l1351868270 9e78917
fix nvl all2all hang
l1351868270 79c2f68
fix workspace conflict
l1351868270 5f231b7
fix workspace conflict
l1351868270 9a1cb23
fix can not overlap
l1351868270 9e8fc76
fix can not overlap complete
l1351868270 5c8ef64
when time out, break
l1351868270 318556e
fix complete hang
l1351868270 860631d
convert zeros to empty
l1351868270 258ca43
convert zeros to empty
l1351868270 2857d73
convert zeros to empty
l1351868270 b342858
a simple overlap method
l1351868270 e757905
fix accuracy is not correct
l1351868270 5a70b97
moe first disptch wait event
l1351868270 5727eb5
add nvl log
l1351868270 98d2c4a
fix single test accuracy is not correct
l1351868270 a230211
update test_m2n_all_layers_v3
l1351868270 4dff1ab
increase workpace memory
l1351868270 b226288
delete m2n test
zhoutianzi666 8257e42
delete m2n test
zhoutianzi666 6d7f7fb
initial commit /root/paddlejob/workspace/env_run/output/zkk/erniebot-…
zhoutianzi666 ae547f2
简化 m2n_ll_two_stage.cu
zhoutianzi666 4c8a61b
make code 漂亮
zhoutianzi666 e69570b
make code 漂亮
zhoutianzi666 02c0517
make code 漂亮
zhoutianzi666 1689620
make code 漂亮
zhoutianzi666 bff1fef
make code 漂亮
zhoutianzi666 ed8152c
make code 漂亮
zhoutianzi666 7dc6498
make code 漂亮
zhoutianzi666 38e7db7
merge develop
zhoutianzi666 f650dd4
not modify /root/paddlejob/workspace/env_run/output/zkk/erniebot-dev/…
zhoutianzi666 d1819ab
restore /root/paddlejob/workspace/env_run/output/zkk/erniebot-dev/202…
zhoutianzi666 3abf2fe
restore cmake
zhoutianzi666 b123378
restore
zhoutianzi666 e953b36
add comment
zhoutianzi666 aa9fc7b
update /root/paddlejob/workspace/env_run/output/zkk/erniebot-dev/2024…
zhoutianzi666 baeec36
not modify
zhoutianzi666 6614880
add comment
zhoutianzi666 5780c65
add comment
zhoutianzi666 fe8afde
format code
zhoutianzi666 7c6fa63
add comment
zhoutianzi666 8f36b6f
add comment
zhoutianzi666 2a489cf
format code
zhoutianzi666 51bc1ac
format code
zhoutianzi666 8c54bd3
format code
zhoutianzi666 2bbbb48
add test
zhoutianzi666 49dace1
update code /root/paddlejob/workspace/env_run/output/zkk/erniebot-dev…
zhoutianzi666 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.