Open
Description
Currently the logic used by the prefix scan is scattered and replicated in many files, e.g.
HeterogeneousCore/AlpakaInterface/interface/prefixScan.h
(obviously)RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/ClusterChargeCut.h
RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/SiPixelRawToClusterKernel.dev.cc
- etc.
We should implement a single prefixscan(acc, ...)
function that is able to deal with arbitrary sized inputs and automatically splits the work in multiple passes, as needed.
It should also use the compile-time warp size (if available), and fall back to a simple loop for single-threaded back-ends.