Description
Motivation
The newer Macs with Apple Silicon (M1 and up) are actually quite powerful and even the lowest end M1 MacBook Air are impressive. In addition, the Apple platform is very suitable for ML workloads thanks to their unified memory architecture (all system RAM can be used as GPU memory with no performance penalty).
The Apple accelerated API is called MPS (Metal Performance Shaders) and is not at all compatible with CUDA, so this requires porting all the kernels, as well as writing the stub code.
Additionally, the Mac is a very popular platform for developers. Supporting MacOS natively for the popular torch libraries (as a longer term goal) means we don't have to resort to expensive Nvidia cloud VMs for every single task.
Proposed solution
- Make this library portable (ongoing effort) (see also [RFC] Cross-Platform Refactor: Overview + Link Hub #997 and [RFC] Cross-Platform Refactor: CPU-only implementation #1021)
- CI for all supported targets to help contributor focus on "their" platform without risking breaking platforms they don't have access to (ongoing effort)
- Make CPU implementation 100% complete ([RFC] Cross-Platform Refactor: CPU-only implementation #1021)
- Write boilerplate code for starting MPS kernels (I started on this in Make native code portable and add GitHub workflow for building #949, but not complete)
- Gradually implement MPS accelerated implementations of all this library's functionality. We can gradually improve the percentage of code that is accelerated.
@Titus-von-Koeller Feel free to edit this issue as you see fit, if you want a different structure for it for example.