-
Notifications
You must be signed in to change notification settings - Fork 133
Description
Summary
torchmonarch currently targets GPU-based execution (e.g., CUDA) and does not support NPU accelerators Issue #1653 and Issue #2296 . Similar to the ongoing discussion in Issue #1649 and Issue #1683 , extending Monarch to additional accelerator backends would broaden its applicability across diverse hardware environments.
Current status
We’ve implemented experimental Ascend NPU support and have successfully run Monarch workloads on Ascend machines:
- Single-node tensor execution
- Multi-node tensor execution
These results indicate that Monarch can be extended to support NPUs without requiring changes to user-facing APIs or application code.
Evidence
Attached screenshots:
- Monarch tensor execution on a single Ascend machine
- Monarch multi-machine tensor execution on two Ascend machines
Proposed next steps
Before submitting code, we’d like maintainer guidance on:
- Be upstreamed incrementally via Issues → PRs in
torchmonarch, or - Whether NPU support should live as a separate sub-project / experimental backend
If helpful, we can start with a small, gated PR (e.g., minimal NPU device discovery / backend abstraction) to align on design and review expectations.
Goal
Contribute NPU support in a way that is aligned with Monarch’s architecture, avoids long-lived forks, and expands Monarch’s hardware ecosystem.