- Characterize simple read, write, and copy kernels as a practical memory-throughput baseline.
- How much bandwidth does the GPU sustain for the simplest contiguous memory modes?
read_onlywrite_onlyread_write_copy
- Run one simple contiguous kernel per memory mode across a size sweep.
- Time dispatch separately from host transfers and validate each mode against a deterministic reference.
- Median GPU time by mode and size.
- Effective GB/s by mode.
- Saturation knee and steady-state comparison across modes.
- This is the closest thing to a roofline-style denominator for later bandwidth claims.
- More complex kernels should be compared against this baseline before claiming efficiency.