- Find the workload size where simple contiguous kernels reach a sustained bandwidth plateau.
- How large must a memory workload be before fixed dispatch overhead is mostly amortized?
read_onlywrite_onlyread_write_copy- dense size sweep from small to large buffers
- Reuse the simple contiguous memory modes from the baseline experiment with a denser size sweep.
- Keep timing limited to GPU dispatch so the plateau reflects kernel execution rather than staging cost.
- GB/s vs size for each memory mode.
- Plateau onset estimate.
- Sustained-region median GB/s.
- The plateau is practical measured bandwidth for the tested device, not proof of theoretical peak.
- Later experiments should prefer sizes in this stable region when the goal is to study steady-state behavior.