- Measure how atomic-heavy histogram construction responds to input skew and privatization.
- How much does shared privatization reduce contention relative to direct global atomics?
global_atomicsprivatized_shared- input distributions
uniform,mixed_hotset, andhot_bin_90
- Build the same histogram from deterministic inputs under several contention regimes.
- Compare direct global updates against a workgroup-private accumulation path with controlled flush behavior.
- Median GPU time by variant and distribution.
- Speedup of
privatized_sharedvsglobal_atomics. - Contention sensitivity across the input distributions.
- Histogram performance is dominated by contention shape, not just by the number of input elements.
- Privatization should be judged by both its best-case speedup and its cost on low-contention inputs.