Skip to content

Write-only benchmark exceeds 100% bandwidth #270

@bernhardmgruber

Description

@bernhardmgruber

@gevtushenko and I have run into this curious case: When benchmarking cub::DeviceTransform::Fill to just fill a buffer with values (see source), we sometimes get results like this:

## generate

### [0] NVIDIA GeForce RTX 5090

| T{ct} |   Elements{io}   | Samples |  CPU Time  | Noise  |  GPU Time  | Noise  |  Elem/s  | GlobalMem BW | BWUtil  | Samples | Batch GPU  |
|-------|------------------|---------|------------|--------|------------|--------|----------|--------------|---------|---------|------------|
|    I8 |     2^16 = 65536 |    314x |   9.475 us | 13.30% |   2.836 us | 34.84% |  23.108G |  23.108 GB/s |   1.29% | 230528x |   2.169 us |
|    I8 |   2^20 = 1048576 |    414x |   9.200 us | 10.08% |   2.683 us | 34.71% | 390.860G | 390.860 GB/s |  21.81% | 235432x |   2.124 us |
|    I8 |  2^24 = 16777216 |    262x |  15.828 us |  4.94% |   8.182 us |  1.94% |   2.050T |   2.050 TB/s | 114.41% | 100812x |   4.961 us |
|    I8 | 2^28 = 268435456 |    370x | 166.190 us |  0.79% | 158.758 us |  0.73% |   1.691T |   1.691 TB/s |  94.35% |   3338x | 158.649 us |

Notice the 114.41% BWUtil for the third run. Trying to figure out what happens, I run the benchmark under ncu and the memory workload analysis shows:
Image
It looks like the kernel did not write back the data from L2 to GMEM. This may explain why the kernel could finish faster than it would take to do the full write back to GMEM (which is what I guess nvbench assumes).

ncu shows me that smaller runs write back zero bytes from L2 to GMEM, and the larger 2^28 run writes back everything except about 62MB of data, which remains in L2. The experiment was done on an RTX 5090 with 92MB L2.

Image

It seems nvbench should incorporate a mechanism that forces the benchmarked kernel to do the full write back to GMEM, so bandwidth measurements on writes are accurate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions