- Establish the good-path baseline for fully contiguous indexing without indirection or stride.
- How fast is a simple sequential read-modify-write kernel before access-pattern penalties are introduced?
sequential_indexing- problem-size sweep
- Map one invocation to one contiguous element in both source and destination buffers.
- Keep arithmetic and memory footprint fixed while scaling only the logical workload size.
- Median GPU time.
- Throughput and effective GB/s.
- Baseline reference for later access-pattern experiments.
- This is the comparison point for gather, scatter, stride, and reuse studies.
- Later slowdowns should be explained relative to this contiguous baseline, not in isolation.