Would it make sense to implement a variation of alpaka::getValidWorkDiv that uses the functionality described in the CUDA Occupancy section to try and provide a more optimal work division, at least for GPUs that support it ?
We could call it getOptimalWorkDiv, or getHeuristicWorkDiv, or anything else ?