Skip to content

[DeepEP] support M2N#75582

Merged
zhoutianzi666 merged 129 commits intoPaddlePaddle:developfrom
zhoutianzi666:m2n_dev
Oct 13, 2025
Merged

[DeepEP] support M2N#75582
zhoutianzi666 merged 129 commits intoPaddlePaddle:developfrom
zhoutianzi666:m2n_dev

Conversation

@zhoutianzi666
Copy link
Copy Markdown
Contributor

@zhoutianzi666 zhoutianzi666 commented Sep 28, 2025

PR Category

Inference

PR Types

New features

Description

支持M2N形式的All2All,用于AFD分离架构中

具体提供4个API
A2E dispatch 的send(A)和receive(E)
E2A combone 的send(E)和receive(A)
具体API的名字见 python/paddle/distributed/communication/deep_ep/buffer.py

使用例如,代码摘自FastDeploy中

@singleton
class EPMegaRunner:

    def __init__(self, fd_config):
        rank = paddle.distributed.get_rank()
        num_ranks = paddle.distributed.get_world_size()
        
        self.group = paddle.distributed.new_group(range(num_ranks))
        

        self.a_start_rank = 0
        self.a_num_ranks = fd_config.parallel_config.attn_group.nranks
        self.e_start_rank = self.a_start_rank + self.a_num_ranks
        self.e_num_ranks = num_ranks - self.a_num_ranks


        self.hidden = 8192 
        self.top_k = 8
        self.num_experts = 64
        self.num_max_tokens = 256
        self.use_fp8 = True
        self.rank = paddle.distributed.get_rank()

        num_rdma_ranks = num_ranks // 8
        self.num_ranks = num_ranks
        num_rdma_bytes = deep_ep.M2NBuffer.get_low_latency_rdma_size_hint_two_stage(
            self.num_max_tokens, self.hidden, self.num_ranks, self.a_num_ranks, self.e_num_ranks, self.num_experts, self.top_k
        )
        # num_rdma_bytes = num_rdma_bytes * 3
 
       
        num_nvl_bytes = deep_ep.M2NBuffer.get_low_latency_nvl_size_hint_two_stage(
            self.num_max_tokens, self.hidden, self.num_ranks, self.a_num_ranks, self.e_num_ranks, self.num_experts, self.top_k, self.use_fp8
        )

        paddle.distributed.barrier()

        self.buffer = deep_ep.M2NBuffer(
                self.group,
                self.a_start_rank,
                self.a_num_ranks,
                self.e_start_rank,
                self.e_num_ranks,
                num_nvl_bytes=num_nvl_bytes,
                num_rdma_bytes=num_rdma_bytes,
                low_latency_mode=True,
                num_qps_per_rank=num_rdma_ranks)

P-card-71501

@zhoutianzi666 zhoutianzi666 changed the title M2n dev [DeepEP] support M2N Oct 9, 2025
…/2024-10-25moe/Paddle/paddle/fluid/distributed/collective/deep_ep/config.hpp and /root/paddlejob/workspace/env_run/output/zkk/erniebot-dev/2024-10-25moe/Paddle/paddle/fluid/distributed/collective/deep_ep/kernels/launch.cuh
Comment thread paddle/fluid/distributed/collective/deep_ep/config.hpp
Comment thread paddle/fluid/distributed/collective/deep_ep/kernels/configs.cuh
Comment thread paddle/fluid/pybind/deep_ep_api.cc
Copy link
Copy Markdown
Contributor

@carryyu carryyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

主代码部分 后续可以整合到和internode_ll_two_stage.cu复用

Copy link
Copy Markdown
Collaborator

@tianshuo78520a tianshuo78520a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for approval

@zhoutianzi666 zhoutianzi666 merged commit 1990bcc into PaddlePaddle:develop Oct 13, 2025
117 of 131 checks passed
SigureMo pushed a commit to cattidea/Paddle that referenced this pull request Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.