- Compare common indexing formulas used to map global IDs onto logical work.
- What overhead is introduced by offset and grid-stride indexing relative to direct mapping?
directfixed_offsetgrid_stride
- Keep arithmetic, memory traffic, and output semantics constant.
- Change only the mapping from invocation ID to logical element and measure the result across larger workloads.
- Median GPU time by mapping strategy.
- Throughput by mapping strategy.
- Relative overhead of flexible indexing vs direct mapping.
- Grid-stride loops buy scalability and launch flexibility, but the control-flow cost still needs to be measured.
- Direct mapping remains the simplest reference path for kernels that can launch exact coverage.