Skip to content

Add NPU support for torchmonarch #2294

@xiaopyyy

Description

@xiaopyyy

Summary

torchmonarch currently targets GPU-based execution (e.g., CUDA) and does not support NPU accelerators Issue #1653 and Issue #2296 . Similar to the ongoing discussion in Issue #1649 and Issue #1683 , extending Monarch to additional accelerator backends would broaden its applicability across diverse hardware environments.

Current status

We’ve implemented experimental Ascend NPU support and have successfully run Monarch workloads on Ascend machines:

  • Single-node tensor execution
  • Multi-node tensor execution

These results indicate that Monarch can be extended to support NPUs without requiring changes to user-facing APIs or application code.

Evidence

Attached screenshots:

  1. Monarch tensor execution on a single Ascend machine
Image
  1. Monarch multi-machine tensor execution on two Ascend machines
Image

Proposed next steps

Before submitting code, we’d like maintainer guidance on:

  • Be upstreamed incrementally via Issues → PRs in torchmonarch, or
  • Whether NPU support should live as a separate sub-project / experimental backend

If helpful, we can start with a small, gated PR (e.g., minimal NPU device discovery / backend abstraction) to align on design and review expectations.

Goal

Contribute NPU support in a way that is aligned with Monarch’s architecture, avoids long-lived forks, and expands Monarch’s hardware ecosystem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions