Skip to content

HCCL Tensor Transport

Last updated: 05/30/2026

HCCL tensor transport enables zero-copy transfer of NPU tensors between Ray actors via HCCS (Huawei Cache Coherence System).

Note: HCCL tensor transport requires Ray >= 2.56.

Quick Example

import ray
import torch
from ray.util.collective import create_collective_group
from ray_ascend import register_hccl_tensor_transport

ray.init()
register_hccl_tensor_transport()

@ray.remote(resources={"NPU": 1})
class RayActor:
    def __init__(self):
        register_hccl_tensor_transport()

    @ray.method(tensor_transport="HCCL")
    def random_tensor(self):
        return torch.zeros(1024, device="npu")

    def sum(self, tensor: torch.Tensor):
        return torch.sum(tensor)

sender, receiver = RayActor.remote(), RayActor.remote()
group = create_collective_group([sender, receiver], backend="HCCL")

tensor = sender.random_tensor.remote()
result = receiver.sum.remote(tensor)
print(ray.get(result))

ray.shutdown()

How It Works

register_hccl_tensor_transport() registers both the HCCL collective backend and the HCCL tensor transport. It must be called in the driver process and in each actor's __init__.

Under the hood, HCCL tensor transport uses Ray's CollectiveTensorTransport infrastructure, which reuses the HCCL collective communicator for point-to-point tensor transfers. A collective group must be created between the sender and receiver actors before using @ray.method(tensor_transport="HCCL").

Supported Device Types

  • NPU: Tensors on Ascend NPU devices (via HCCS)