- Measure the cost of indirect writes and the slowdown introduced by target collisions.
- How much do write-target distribution and contention change scatter throughput?
unique_targetslow_collision_randomhigh_collision_clustered
- Keep the number of logical writes and arithmetic work fixed while changing only the target-index distribution.
- Validate the final output against deterministic CPU reference behavior for each collision regime.
- Median GPU time by scatter distribution.
- Relative slowdown vs the unique-target baseline.
- Contention-sensitive throughput comparison.
- Scatter is not just the write-side mirror of gather because collisions can serialize progress.
- The result should guide whether later pipelines need privatization, staging, or compaction.