Decouple monarch-rdma from CUDA for hardware-neutral RDMA

## **Summary**

`monarch-rdma` is now tightly coupled to CUDA/NVIDIA-specific components, which blocks Monarch from running on non-CUDA accelerators (e.g., Ascend NPUs). Decoupling RDMA from CUDA is required to achieve hardware-neutral accelerator support.

## **Key blocking dependencies**

1. **NVIDIA-specific infrastructure**
   Reliance on GPUDirect RDMA, `nvidia_peermem`, and NVIDIA driver assumptions prevents RDMA use on non-NVIDIA systems.

2. **CUDA-bound RDMA offloading**
   `rdmaxcel-sys` implements critical RDMA operations (e.g., `send_wqe`, `db_ring`) as CUDA kernels, binding RDMA execution to the CUDA driver API.

3. **CUDA-only PyTorch integration**
   `pytorch_segment_scanner` depends on `torch.cuda.memory._snapshot()`, tying memory inspection and registration to the CUDA backend.

## **Goal / discussion**

What would be the preferred approach to refactor `monarch-rdma` toward a **hardware-neutral RDMA layer**, for example, by introducing an accelerator-agnostic RDMA abstraction and isolating CUDA-specific optimizations behind optional backends?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple monarch-rdma from CUDA for hardware-neutral RDMA #2296

Summary

Key blocking dependencies

Goal / discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Decouple monarch-rdma from CUDA for hardware-neutral RDMA #2296

Description

Summary

Key blocking dependencies

Goal / discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions