Open
Description
Motivation
As we want to have this library portable, the first step would be to make 100% of this library run correctly on only CPU (i.e. not requiring CUDA for any part of the functionality). This would serve two purposes:
- Provide a baseline that contributors of ports can reference
- Provide a fallback for partially implemented hardware platforms
Proposed solution
- Implement all the CUDA kernels in "normal" C++
- Make sure the unit tests all run on the CPU as well
- Make sure unit test coverage is satisfactory
Open questions
- Which CPU architectures do we support (x86_64 and arm64 are givens, but any more)?
- How do we deal with SIMD intrinsics? Build separate libraries for each SIMD architecture? Or run-time selection based on CPU features?
@Titus-von-Koeller Feel free to edit this issue as you see fit, if you want a different structure for it for example.tbd
tbd