Description
Purpose
We intend to integrate PyTorch Custom Operators as the primary mechanism for dispatching to device-specific operator implementations. An initial scaffolding of this is presented in PR #1544. This RFC will serve as a guideline to collect community feedback and refine our development plans moving forward.
Why?
- Registering operators with
torch.library
allows us to take advantage of the existing device dispatch mechanisms in PyTorch. - We can treat calls to functionality in our CUDA kernels, or other low-level backend implementations, as opaque for improved
torch.compile
support. - We can provide naive implementations of operators with only PyTorch code as a fallback option. This may additionally serve as a secondary CPU baseline, as per [RFC] Cross-Platform Refactor: CPU-only implementation #1021.
- This helps to simplify the development for additional backends, while taking an idiomatic modern PyTorch approach.
What about the multi-backend-refactor
branch?
We are planning to deprecate further development on that branch upon the merging of #1544. After that point, the expectation is that we will implement backends using the new custom operator registration mechanisms. We expect to be able to reuse much of the existing implementations in the refactoring process.
Our goal is to aggressively mainline our in-tree backends, while additionally enabling out-of-tree backends. We will expand on this topic in the near future.
Supersedure
This RFC is intended to supersede topics which were covered in previously related RFCs which remained open as of this writing:
- [RFC] Extend bitsandbytes to support Intel hardware platforms #894
- [RFC] Cross-Platform Refactor: Mac M1 support #1020
- [RFC] Cross-Platform Refactor: CPU-only implementation #1021
Related Issues
Related issues and discussions include:
- ROCM Support #47
- Feature Request: ROCm support (AMD GPU) #107
- M1.M2 MacOS Users #485
- [RFC] Cross-Platform Refactor: Testing and CI/CD Strategy #1031
- ROCm Backend Status Tracker #1271
- Multi-backend refactor: Alpha release ( INTEL ONLY ) #1338
- Multi-backend refactor: Alpha release ( AMD ROCm ONLY ) #1339
- Multi-backend support: Apple Silicon / Mac (call for contributors + help fleshing out the details) #1340
- Support running on CPU #1402
- aarch64 whl in PyPi #1437
- bitsandbytes for macos M1,M2,M3 chips #1460
- Building on NVidia GH-200 - Is this a supported platform? #1526
Additionally, this relates to the following issues and discussions which have been closed:
- Enabling Cross-platform Support #990
- [RFC] Cross-Platform Refactor: Overview + Link Hub #997
- [RFC] Cross-Platform Refactor: Build System and Binary Distribution #1032
- Release v44 not available for Mac #1378
Relevant contributors
The following contributors may have particular interest and feedback on this topic:
@Titus-von-Koeller
@christoph-koehncke
@jiqing-feng
@pnunna93
@akx
@rickardp
@ji-huazhong
@SlightwindSec