- Compare a locality-friendly sequential pass against a cache-thrashing random pass.
- How much throughput is lost when the same useful work is reordered into a poor-locality access pattern?
sequentialrandom
- Run the same logical transform with identical output semantics and only change the access ordering.
- Keep arithmetic and total payload fixed so the result isolates locality loss rather than extra work.
- Median GPU time by ordering.
- Effective GB/s by ordering.
- Slowdown of
randomrelative tosequential.
- This is a direct locality study, so the important result is the gap between the two orderings.
- The measured penalty supports arguments for sorting, binning, or data-layout reordering.