-
Notifications
You must be signed in to change notification settings - Fork 43
Description
Hi, Thank you for your nice work! I have read about your paper Collective Communication for 100k+ GPUs especially Chapter 5.1 PP: Zero-copy and SM-free Send/Receive.
I wander how to use CTran to achieve SM-free and Zero Copy send recv in async way, without using NCCL copy-based send/recv or RDMA which would rely on pre-allocate buffer, or maybe register user tensor as RDMA MR every time when we launch send/recv? Is there a best practice?
In the evaluation chapter of your paper Collective Communication for 100k+ GPUs, you have mention SM-Free and Zero Copy send recv. So I really really want to try it :)
Also I notice send recv in ncclx backend still using nccl, which is not SM Free and Zero Copy, I wander why we don't use Ctran to implement a better version that is SM Free and Zero Copy?