| About Ascend | Documentation |
ray-ascend is a community maintained hardware plugin to support advanced Ray features on Ascend NPU accelerators.
The default ray natively supports Ascend NPU as a pre-defined resource type to bind actors and tasks (see ray accelerator support). As an enhancement, ray-ascend provides Ascend-native features on ray, such as collective communication Huawei Collective Communication Library (HCCL), Ray Direct Transport (RDT), etc.
- Architecture: aarch64, x86
- OS kernel: linux
- Python dependencies
- python>=3.10, <=3.12
- CANN==8.2.rc1
- torch==2.7.1, torch-npu==2.7.1.post1
- ray (the same version as ray-ascend)
| Version | Release type | Doc |
|---|---|---|
| 0.54.0rc1 | Latest release candidate |
pip install ray-ascend[yr]import ray
from ray.util import collective
from ray_ascend.collective import HCCLGroup
ray.register_collective_backend("HCCL", HCCLGroup)
collective.create_collective_group(
actors,
len(actors),
list(range(0, len(actors))),
backend="HCCL",
group_name="my_group",
)
# each actor broadcast in a spmd manner
collective.broadcast(tensor, src_rank=0, group_name="my_group")Transport Ascend NPU tensors via HCCS
import ray
from ray_ascend.direct_transport import HCCLTensorTransport
from ray.experimental import register_tensor_transport
register_tensor_transport("HCCL", ["npu"], HCCLTensorTransport)
@ray.remote
class RayActor:
@ray.method(tensor_transport="HCCL")
def transfer_npu_tensor_via_hccs():
return torch.zeros(1024, device="npu")
sender = RayActor.remote()
npu_tensor = ray.get(sender.transfer_npu_tensor_via_hccs())Transport Ascend NPU tensors via HCCS and CPU tensors via RDMA
openYuanrong-datasystem (YR) allows users to transport NPU tensors (via HCCS) and CPU tensors (via RDMA if provided) by ray objects.
import ray
from ray_ascend.direct_transport import YRTensorTransport
from ray.experimental import register_tensor_transport
register_tensor_transport("YR", ["npu", "cpu"], YRTensorTransport)
@ray.remote
class RayActor:
@ray.method(tensor_transport="YR")
def transfer_npu_tensor_via_hccs():
return torch.zeros(1024, device="npu")
@ray.method(tensor_transport="YR")
def transfer_cpu_tensor_via_rdma():
return torch.zeros(1024)
sender = RayActor.remote()
npu_tensor = ray.get(sender.transfer_npu_tensor_via_hccs())
cpu_tensor = ray.get(sender.transfer_cpu_tensor_via_rdma())See CONTRIBUTING for more details, which is a step-by-step guide to help you set up development environment, build and test. Please let us know if you find a bug or request a feature by filing an issue.
Apache License 2.0. See LICENSE file.