Skip to content

Decouple monarch-rdma from CUDA for hardware-neutral RDMA #2296

@xiaopyyy

Description

@xiaopyyy

Summary

monarch-rdma is now tightly coupled to CUDA/NVIDIA-specific components, which blocks Monarch from running on non-CUDA accelerators (e.g., Ascend NPUs). Decoupling RDMA from CUDA is required to achieve hardware-neutral accelerator support.

Key blocking dependencies

  1. NVIDIA-specific infrastructure
    Reliance on GPUDirect RDMA, nvidia_peermem, and NVIDIA driver assumptions prevents RDMA use on non-NVIDIA systems.

  2. CUDA-bound RDMA offloading
    rdmaxcel-sys implements critical RDMA operations (e.g., send_wqe, db_ring) as CUDA kernels, binding RDMA execution to the CUDA driver API.

  3. CUDA-only PyTorch integration
    pytorch_segment_scanner depends on torch.cuda.memory._snapshot(), tying memory inspection and registration to the CUDA backend.

Goal / discussion

What would be the preferred approach to refactor monarch-rdma toward a hardware-neutral RDMA layer, for example, by introducing an accelerator-agnostic RDMA abstraction and isolating CUDA-specific optimizations behind optional backends?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions