- Tune the tile shape used by the shared-memory stencil path.
- Which tile size gives the best tradeoff between reuse, occupancy, and local-memory overhead?
shared_tiledwith several tile sizesdirect_globalreference
- Reuse the tiled stencil workload from Experiment 16 and sweep tile size only.
- Hold the stencil radius, output semantics, and timing path fixed across the sweep.
- Median GPU time by tile size.
- Best tile-size recommendation.
- Speedup relative to the direct-global reference.
- Tile size is a tuning parameter, not a universal constant.
- The winning point should be treated as hardware- and workload-specific evidence.