Skip to content

Conversation

@bernhardmgruber
Copy link
Collaborator

@bernhardmgruber bernhardmgruber commented Aug 14, 2025

Fixes: #247

Cold and batch measurements can sometimes differ substantially, so we want to show both. An example is kernels using PDL (Programmatic Dependent Launch).

Here is a comparison of DeviceTransform with and without PDL (see also NVIDIA/cccl#5249):

# mul

## [0] NVIDIA B200

|  T{ct}  |  OffsetT{ct}  |  Elements{io}  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |      Diff |   %Diff |  Status  |   B Ref Time |   B Cmp Time |    B Diff |   B %Diff |  B Status  |
|---------|---------------|----------------|------------|-------------|------------|-------------|-----------|---------|----------|--------------|--------------|-----------|-----------|------------|
|   I8    |      I32      |      2^16      |   5.729 us |      11.77% |   5.734 us |      11.65% |  0.005 us |   0.08% |   SAME   |     4.094 us |     1.622 us | -2.472 us |   -60.39% |    FAST    |
|   I8    |      I32      |      2^20      |   6.016 us |       5.61% |   6.130 us |       7.31% |  0.114 us |   1.90% |   SAME   |     4.101 us |     2.831 us | -1.270 us |   -30.97% |    FAST    |
|   I8    |      I32      |      2^24      |  13.739 us |       5.56% |  13.620 us |       5.84% | -0.119 us |  -0.87% |   SAME   |    10.271 us |     8.388 us | -1.883 us |   -18.33% |    FAST    |
|   I8    |      I32      |      2^28      | 114.526 us |       0.25% | 114.550 us |       0.28% |  0.024 us |   0.02% |   SAME   |   112.599 us |   110.039 us | -2.560 us |    -2.27% |    FAST    |
|   I8    |      I64      |      2^16      |   5.494 us |      14.80% |   5.512 us |      14.22% |  0.018 us |   0.33% |   SAME   |     4.094 us |     1.594 us | -2.500 us |   -61.06% |    FAST    |
|   I8    |      I64      |      2^20      |   6.059 us |       5.71% |   6.383 us |       9.59% |  0.324 us |   5.35% |   SAME   |     4.101 us |     2.826 us | -1.274 us |   -31.07% |    FAST    |
|   I8    |      I64      |      2^24      |  14.075 us |       2.96% |  14.041 us |       3.52% | -0.035 us |  -0.25% |   SAME   |    10.290 us |     8.436 us | -1.854 us |   -18.02% |    FAST    |
|   I8    |      I64      |      2^28      | 115.311 us |       0.80% | 115.433 us |       0.82% |  0.122 us |   0.11% |   SAME   |   112.952 us |   110.872 us | -2.079 us |    -1.84% |    FAST    |
|   I16   |      I32      |      2^16      |   5.346 us |      16.26% |   5.377 us |      15.65% |  0.031 us |   0.58% |   SAME   |     4.094 us |     1.599 us | -2.496 us |   -60.96% |    FAST    |
|   I16   |      I32      |      2^20      |   6.099 us |       6.33% |   6.380 us |       9.76% |  0.281 us |   4.61% |   SAME   |     4.099 us |     2.815 us | -1.285 us |   -31.34% |    FAST    |
|   I16   |      I32      |      2^24      |  16.382 us |       3.32% |  16.376 us |       3.23% | -0.006 us |  -0.04% |   SAME   |    12.288 us |    10.074 us | -2.214 us |   -18.02% |    FAST    |
|   I16   |      I32      |      2^28      | 161.785 us |       0.28% | 161.935 us |       0.44% |  0.150 us |   0.09% |   SAME   |   159.618 us |   156.623 us | -2.996 us |    -1.88% |    FAST    |
|   I16   |      I64      |      2^16      |   5.672 us |      12.72% |   5.756 us |      11.04% |  0.084 us |   1.48% |   SAME   |     4.094 us |     1.577 us | -2.518 us |   -61.49% |    FAST    |
|   I16   |      I64      |      2^20      |   6.129 us |       6.10% |   6.391 us |      10.08% |  0.262 us |   4.28% |   SAME   |     4.102 us |     2.816 us | -1.286 us |   -31.36% |    FAST    |
|   I16   |      I64      |      2^24      |  16.434 us |       3.20% |  16.467 us |       3.44% |  0.034 us |   0.21% |   SAME   |    12.289 us |    10.120 us | -2.169 us |   -17.65% |    FAST    |
|   I16   |      I64      |      2^28      | 163.663 us |       0.22% | 163.721 us |       0.16% |  0.058 us |   0.04% |   SAME   |   160.345 us |   158.210 us | -2.135 us |    -1.33% |    FAST    |
|   F32   |      I32      |      2^16      |   5.935 us |       6.94% |   5.907 us |       6.24% | -0.028 us |  -0.47% |   SAME   |     4.094 us |     1.595 us | -2.499 us |   -61.04% |    FAST    |
|   F32   |      I32      |      2^20      |   7.543 us |      10.91% |   7.497 us |      10.90% | -0.046 us |  -0.61% |   SAME   |     4.099 us |     2.856 us | -1.244 us |   -30.34% |    FAST    |
|   F32   |      I32      |      2^24      |  25.470 us |       3.65% |  25.395 us |       3.59% | -0.075 us |  -0.29% |   SAME   |    20.843 us |    19.593 us | -1.251 us |    -6.00% |    FAST    |
|   F32   |      I32      |      2^28      | 313.226 us |       0.32% | 313.291 us |       0.30% |  0.065 us |   0.02% |   SAME   |   311.189 us |   308.332 us | -2.857 us |    -0.92% |    SAME    |
|   F32   |      I64      |      2^16      |   5.545 us |      14.41% |   5.487 us |      14.25% | -0.058 us |  -1.05% |   SAME   |     4.094 us |     1.600 us | -2.494 us |   -60.92% |    FAST    |
|   F32   |      I64      |      2^20      |   7.494 us |      11.26% |   7.432 us |      11.17% | -0.062 us |  -0.83% |   SAME   |     4.099 us |     2.859 us | -1.241 us |   -30.26% |    FAST    |
|   F32   |      I64      |      2^24      |  25.602 us |       3.56% |  25.608 us |       3.53% |  0.006 us |   0.02% |   SAME   |    20.726 us |    19.591 us | -1.136 us |    -5.48% |    FAST    |
|   F32   |      I64      |      2^28      | 313.271 us |       0.33% | 313.254 us |       0.31% | -0.017 us |  -0.01% |   SAME   |   311.100 us |   308.282 us | -2.818 us |    -0.91% |    SAME    |
|   F64   |      I32      |      2^16      |   5.706 us |      11.69% |   5.719 us |      11.38% |  0.014 us |   0.24% |   SAME   |     4.094 us |     1.596 us | -2.498 us |   -61.01% |    FAST    |
|   F64   |      I32      |      2^20      |   8.084 us |       4.22% |   8.043 us |       4.93% | -0.041 us |  -0.51% |   SAME   |     4.108 us |     2.882 us | -1.226 us |   -29.85% |    FAST    |
|   F64   |      I32      |      2^24      |  45.629 us |       2.07% |  45.498 us |       1.93% | -0.131 us |  -0.29% |   SAME   |    43.046 us |    40.308 us | -2.738 us |    -6.36% |    FAST    |
|   F64   |      I32      |      2^28      | 620.092 us |       0.23% | 620.155 us |       0.19% |  0.063 us |   0.01% |   SAME   |   617.714 us |   614.871 us | -2.843 us |    -0.46% |    SAME    |
|   F64   |      I64      |      2^16      |   5.698 us |      11.99% |   5.698 us |      11.30% | -0.001 us |  -0.01% |   SAME   |     4.094 us |     1.616 us | -2.478 us |   -60.53% |    FAST    |
|   F64   |      I64      |      2^20      |   8.098 us |       4.25% |   8.032 us |       4.07% | -0.067 us |  -0.82% |   SAME   |     4.106 us |     2.896 us | -1.210 us |   -29.47% |    FAST    |
|   F64   |      I64      |      2^24      |  45.517 us |       2.01% |  45.637 us |       2.05% |  0.119 us |   0.26% |   SAME   |    43.031 us |    40.252 us | -2.779 us |    -6.46% |    FAST    |
|   F64   |      I64      |      2^28      | 620.032 us |       0.22% | 620.114 us |       0.22% |  0.082 us |   0.01% |   SAME   |   617.629 us |   614.842 us | -2.786 us |    -0.45% |    SAME    |
|  I128   |      I32      |      2^16      |   5.959 us |       4.88% |   5.917 us |       5.20% | -0.042 us |  -0.71% |   SAME   |     4.094 us |     1.596 us | -2.499 us |   -61.03% |    FAST    |
|  I128   |      I32      |      2^20      |  10.308 us |       4.86% |  10.378 us |       5.50% |  0.070 us |   0.67% |   SAME   |     6.141 us |     5.100 us | -1.042 us |   -16.97% |    FAST    |
|  I128   |      I32      |      2^24      |  83.570 us |       0.98% |  83.639 us |       0.89% |  0.068 us |   0.08% |   SAME   |    81.530 us |    78.615 us | -2.916 us |    -3.58% |    FAST    |
|  I128   |      I32      |      2^28      |   1.235 ms |       0.15% |   1.235 ms |       0.15% | -0.010 us |  -0.00% |   SAME   |     1.233 ms |     1.231 ms | -2.248 us |    -0.18% |    SAME    |
|  I128   |      I64      |      2^16      |   5.979 us |       4.55% |   5.995 us |       4.38% |  0.016 us |   0.27% |   SAME   |     4.094 us |     1.601 us | -2.493 us |   -60.90% |    FAST    |
|  I128   |      I64      |      2^20      |  10.320 us |       4.89% |  10.433 us |       6.27% |  0.113 us |   1.10% |   SAME   |     6.141 us |     5.091 us | -1.050 us |   -17.10% |    FAST    |
|  I128   |      I64      |      2^24      |  83.769 us |       0.74% |  83.803 us |       0.77% |  0.034 us |   0.04% |   SAME   |    81.544 us |    78.602 us | -2.942 us |    -3.61% |    FAST    |
|  I128   |      I64      |      2^28      |   1.234 ms |       0.15% |   1.234 ms |       0.14% | -0.059 us |  -0.00% |   SAME   |     1.233 ms |     1.231 ms | -1.907 us |    -0.15% |    SAME    |

# add

## [0] NVIDIA B200

|  T{ct}  |  OffsetT{ct}  |  Elements{io}  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |   B Ref Time |   B Cmp Time |     B Diff |   B %Diff |  B Status  |
|---------|---------------|----------------|------------|-------------|------------|-------------|------------|---------|----------|--------------|--------------|------------|-----------|------------|
|   I8    |      I32      |      2^16      |   5.829 us |       9.25% |   5.838 us |       8.67% |   0.009 us |   0.16% |   SAME   |     4.094 us |     1.594 us |  -2.500 us |   -61.06% |    FAST    |
|   I8    |      I32      |      2^20      |   6.113 us |       6.09% |   6.388 us |      10.02% |   0.275 us |   4.49% |   SAME   |     4.098 us |     3.051 us |  -1.047 us |   -25.55% |    FAST    |
|   I8    |      I32      |      2^24      |  15.292 us |       6.13% |  15.207 us |       6.09% |  -0.085 us |  -0.56% |   SAME   |    11.264 us |     9.365 us |  -1.899 us |   -16.86% |    FAST    |
|   I8    |      I32      |      2^28      | 141.213 us |       0.11% | 141.350 us |       0.41% |   0.137 us |   0.10% |   SAME   |   137.296 us |   135.994 us |  -1.302 us |    -0.95% |    SAME    |
|   I8    |      I64      |      2^16      |   5.919 us |       7.47% |   5.926 us |       6.03% |   0.007 us |   0.12% |   SAME   |     4.094 us |     1.604 us |  -2.490 us |   -60.82% |    FAST    |
|   I8    |      I64      |      2^20      |   6.111 us |       5.90% |   6.422 us |      10.09% |   0.310 us |   5.07% |   SAME   |     4.098 us |     3.097 us |  -1.001 us |   -24.43% |    FAST    |
|   I8    |      I64      |      2^24      |  15.775 us |       5.14% |  15.685 us |       5.19% |  -0.091 us |  -0.57% |   SAME   |    11.875 us |     9.387 us |  -2.489 us |   -20.96% |    FAST    |
|   I8    |      I64      |      2^28      | 143.584 us |       0.49% | 143.338 us |       0.34% |  -0.246 us |  -0.17% |   SAME   |   141.093 us |   137.760 us |  -3.333 us |    -2.36% |    FAST    |
|   I16   |      I32      |      2^16      |   5.936 us |       6.48% |   5.841 us |       9.35% |  -0.095 us |  -1.60% |   SAME   |     4.094 us |     1.627 us |  -2.467 us |   -60.26% |    FAST    |
|   I16   |      I32      |      2^20      |   6.649 us |      11.53% |   6.616 us |      11.37% |  -0.033 us |  -0.49% |   SAME   |     4.098 us |     3.023 us |  -1.075 us |   -26.23% |    FAST    |
|   I16   |      I32      |      2^24      |  22.386 us |       1.29% |  22.510 us |       2.15% |   0.123 us |   0.55% |   SAME   |    16.525 us |    18.175 us |   1.650 us |     9.98% |    SLOW    |
|   I16   |      I32      |      2^28      | 237.871 us |       0.33% | 272.302 us |       0.11% |  34.431 us |  14.47% |   SLOW   |   233.873 us |   268.593 us |  34.719 us |    14.85% |    SLOW    |
|   I16   |      I64      |      2^16      |   5.623 us |      13.25% |   5.612 us |      13.14% |  -0.011 us |  -0.20% |   SAME   |     4.094 us |     1.604 us |  -2.490 us |   -60.82% |    FAST    |
|   I16   |      I64      |      2^20      |   6.673 us |      11.74% |   6.706 us |      11.73% |   0.033 us |   0.50% |   SAME   |     4.098 us |     3.101 us |  -0.997 us |   -24.32% |    FAST    |
|   I16   |      I64      |      2^24      |  22.365 us |       1.38% |  22.528 us |       1.83% |   0.163 us |   0.73% |   SAME   |    16.525 us |    18.208 us |   1.683 us |    10.19% |    SLOW    |
|   I16   |      I64      |      2^28      | 239.449 us |       0.17% | 272.324 us |       0.19% |  32.875 us |  13.73% |   SLOW   |   235.292 us |   268.593 us |  33.301 us |    14.15% |    SLOW    |
|   F32   |      I32      |      2^16      |   5.787 us |      10.26% |   5.779 us |      10.38% |  -0.008 us |  -0.13% |   SAME   |     4.094 us |     1.575 us |  -2.519 us |   -61.52% |    FAST    |
|   F32   |      I32      |      2^20      |   8.008 us |       3.51% |   7.970 us |       3.98% |  -0.038 us |  -0.47% |   SAME   |     4.098 us |     2.875 us |  -1.223 us |   -29.84% |    FAST    |
|   F32   |      I32      |      2^24      |  35.876 us |       2.71% |  38.696 us |       0.88% |   2.820 us |   7.86% |   SLOW   |    32.666 us |    34.847 us |   2.182 us |     6.68% |    SLOW    |
|   F32   |      I32      |      2^28      | 454.211 us |       0.26% | 539.121 us |       0.17% |  84.909 us |  18.69% |   SLOW   |   452.508 us |   535.671 us |  83.163 us |    18.38% |    SLOW    |
|   F32   |      I64      |      2^16      |   5.958 us |       4.79% |   5.942 us |       5.19% |  -0.017 us |  -0.28% |   SAME   |     4.094 us |     1.590 us |  -2.504 us |   -61.17% |    FAST    |
|   F32   |      I64      |      2^20      |   8.016 us |       3.59% |   8.033 us |       3.33% |   0.018 us |   0.22% |   SAME   |     4.098 us |     2.880 us |  -1.218 us |   -29.73% |    FAST    |
|   F32   |      I64      |      2^24      |  35.917 us |       2.54% |  38.719 us |       0.85% |   2.802 us |   7.80% |   SLOW   |    32.694 us |    34.865 us |   2.171 us |     6.64% |    SLOW    |
|   F32   |      I64      |      2^28      | 453.938 us |       0.27% | 539.019 us |       0.16% |  85.081 us |  18.74% |   SLOW   |   452.062 us |   535.638 us |  83.577 us |    18.49% |    SLOW    |
|   F64   |      I32      |      2^16      |   6.006 us |       4.47% |   5.973 us |       4.50% |  -0.033 us |  -0.55% |   SAME   |     4.094 us |     1.594 us |  -2.501 us |   -61.08% |    FAST    |
|   F64   |      I32      |      2^20      |  10.086 us |       2.80% |  10.059 us |       3.17% |  -0.027 us |  -0.26% |   SAME   |     6.141 us |     5.197 us |  -0.944 us |   -15.37% |    FAST    |
|   F64   |      I32      |      2^24      |  65.164 us |       1.05% |  71.574 us |       0.51% |   6.411 us |   9.84% |   SLOW   |    59.868 us |    68.208 us |   8.340 us |    13.93% |    SLOW    |
|   F64   |      I32      |      2^28      | 909.760 us |       0.20% |   1.073 ms |       0.05% | 163.440 us |  17.97% |   SLOW   |   905.075 us |     1.070 ms | 164.766 us |    18.20% |    SLOW    |
|   F64   |      I64      |      2^16      |   5.962 us |       4.90% |   5.957 us |       5.03% |  -0.005 us |  -0.09% |   SAME   |     4.094 us |     1.583 us |  -2.511 us |   -61.34% |    FAST    |
|   F64   |      I64      |      2^20      |  10.092 us |       2.78% |  10.089 us |       2.82% |  -0.003 us |  -0.03% |   SAME   |     6.141 us |     5.192 us |  -0.950 us |   -15.46% |    FAST    |
|   F64   |      I64      |      2^24      |  65.317 us |       0.74% |  71.592 us |       0.53% |   6.275 us |   9.61% |   SLOW   |    59.828 us |    68.206 us |   8.378 us |    14.00% |    SLOW    |
|   F64   |      I64      |      2^28      | 909.178 us |       0.19% |   1.073 ms |       0.06% | 164.056 us |  18.04% |   SLOW   |   904.768 us |     1.070 ms | 165.069 us |    18.24% |    SLOW    |
|  I128   |      I32      |      2^16      |   5.997 us |       4.65% |   5.947 us |       5.01% |  -0.050 us |  -0.83% |   SAME   |     4.094 us |     1.593 us |  -2.501 us |   -61.10% |    FAST    |
|  I128   |      I32      |      2^20      |  14.195 us |       1.87% |  14.194 us |       1.96% |  -0.001 us |  -0.00% |   SAME   |     8.188 us |     9.372 us |   1.184 us |    14.46% |    SLOW    |
|  I128   |      I32      |      2^24      | 120.742 us |       0.76% | 138.904 us |       0.47% |  18.162 us |  15.04% |   SLOW   |   116.723 us |   134.950 us |  18.227 us |    15.62% |    SLOW    |
|  I128   |      I32      |      2^28      |   1.811 ms |       0.15% |   2.142 ms |       0.02% | 330.608 us |  18.25% |   SLOW   |     1.807 ms |     2.140 ms | 333.116 us |    18.44% |    SLOW    |
|  I128   |      I64      |      2^16      |   5.979 us |       5.28% |   5.991 us |       4.69% |   0.012 us |   0.21% |   SAME   |     4.094 us |     1.594 us |  -2.500 us |   -61.07% |    FAST    |
|  I128   |      I64      |      2^20      |  14.176 us |       1.96% |  14.138 us |       2.08% |  -0.038 us |  -0.27% |   SAME   |     8.188 us |     9.378 us |   1.190 us |    14.53% |    SLOW    |
|  I128   |      I64      |      2^24      | 120.699 us |       0.75% | 138.979 us |       0.35% |  18.280 us |  15.15% |   SLOW   |   116.648 us |   134.950 us |  18.303 us |    15.69% |    SLOW    |
|  I128   |      I64      |      2^28      |   1.811 ms |       0.15% |   2.142 ms |       0.02% | 330.955 us |  18.28% |   SLOW   |     1.805 ms |     2.140 ms | 334.179 us |    18.51% |    SLOW    |

# triad

## [0] NVIDIA B200

|  T{ct}  |  OffsetT{ct}  |  Elements{io}  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |   B Ref Time |   B Cmp Time |     B Diff |   B %Diff |  B Status  |
|---------|---------------|----------------|------------|-------------|------------|-------------|------------|---------|----------|--------------|--------------|------------|-----------|------------|
|   I8    |      I32      |      2^16      |   5.814 us |       9.73% |   5.905 us |       6.76% |   0.091 us |   1.57% |   SAME   |     4.094 us |     1.602 us |  -2.493 us |   -60.88% |    FAST    |
|   I8    |      I32      |      2^20      |   6.128 us |       6.23% |   6.365 us |       9.87% |   0.237 us |   3.87% |   SAME   |     4.098 us |     3.039 us |  -1.059 us |   -25.83% |    FAST    |
|   I8    |      I32      |      2^24      |  16.207 us |       1.78% |  16.208 us |       1.79% |   0.001 us |   0.01% |   SAME   |    12.294 us |    10.362 us |  -1.932 us |   -15.72% |    FAST    |
|   I8    |      I32      |      2^28      | 149.994 us |       0.59% | 150.658 us |       0.63% |   0.664 us |   0.44% |   SAME   |   147.459 us |   145.001 us |  -2.458 us |    -1.67% |    FAST    |
|   I8    |      I64      |      2^16      |   6.006 us |       4.56% |   5.974 us |       4.71% |  -0.032 us |  -0.53% |   SAME   |     4.094 us |     1.613 us |  -2.481 us |   -60.60% |    FAST    |
|   I8    |      I64      |      2^20      |   6.083 us |       5.23% |   6.343 us |       9.49% |   0.260 us |   4.27% |   SAME   |     4.098 us |     3.083 us |  -1.015 us |   -24.76% |    FAST    |
|   I8    |      I64      |      2^24      |  16.208 us |       1.73% |  16.172 us |       1.79% |  -0.037 us |  -0.23% |   SAME   |    12.294 us |    10.623 us |  -1.671 us |   -13.59% |    FAST    |
|   I8    |      I64      |      2^28      | 153.088 us |       0.50% | 153.434 us |       0.21% |   0.346 us |   0.23% |   SLOW   |   149.586 us |   147.658 us |  -1.928 us |    -1.29% |    FAST    |
|   I16   |      I32      |      2^16      |   5.718 us |      11.90% |   5.721 us |      11.60% |   0.003 us |   0.05% |   SAME   |     4.094 us |     1.605 us |  -2.489 us |   -60.81% |    FAST    |
|   I16   |      I32      |      2^20      |   6.924 us |      13.05% |   7.008 us |      12.94% |   0.084 us |   1.21% |   SAME   |     4.098 us |     2.979 us |  -1.119 us |   -27.30% |    FAST    |
|   I16   |      I32      |      2^24      |  22.350 us |       1.38% |  22.718 us |       2.59% |   0.369 us |   1.65% |   SLOW   |    16.474 us |    18.079 us |   1.605 us |     9.74% |    SLOW    |
|   I16   |      I32      |      2^28      | 239.548 us |       0.12% | 272.308 us |       0.13% |  32.761 us |  13.68% |   SLOW   |   235.182 us |   268.432 us |  33.249 us |    14.14% |    SLOW    |
|   I16   |      I64      |      2^16      |   5.888 us |       8.48% |   5.782 us |       9.90% |  -0.106 us |  -1.80% |   SAME   |     4.094 us |     1.592 us |  -2.502 us |   -61.12% |    FAST    |
|   I16   |      I64      |      2^20      |   7.166 us |      12.98% |   7.195 us |      12.81% |   0.030 us |   0.41% |   SAME   |     4.098 us |     3.083 us |  -1.015 us |   -24.78% |    FAST    |
|   I16   |      I64      |      2^24      |  22.381 us |       1.20% |  22.704 us |       2.70% |   0.322 us |   1.44% |   SLOW   |    17.928 us |    18.183 us |   0.255 us |     1.42% |    SLOW    |
|   I16   |      I64      |      2^28      | 241.571 us |       0.12% | 272.374 us |       0.17% |  30.803 us |  12.75% |   SLOW   |   237.152 us |   268.438 us |  31.286 us |    13.19% |    SLOW    |
|   F32   |      I32      |      2^16      |   5.989 us |       4.50% |   5.950 us |       4.88% |  -0.039 us |  -0.66% |   SAME   |     4.094 us |     1.594 us |  -2.500 us |   -61.06% |    FAST    |
|   F32   |      I32      |      2^20      |   8.021 us |       3.47% |   7.989 us |       3.89% |  -0.032 us |  -0.39% |   SAME   |     4.098 us |     2.865 us |  -1.233 us |   -30.09% |    FAST    |
|   F32   |      I32      |      2^24      |  36.517 us |       1.70% |  38.782 us |       0.80% |   2.266 us |   6.20% |   SLOW   |    32.731 us |    34.859 us |   2.128 us |     6.50% |    SLOW    |
|   F32   |      I32      |      2^28      | 453.094 us |       0.24% | 539.057 us |       0.16% |  85.962 us |  18.97% |   SLOW   |   452.672 us |   535.696 us |  83.024 us |    18.34% |    SLOW    |
|   F32   |      I64      |      2^16      |   5.973 us |       4.63% |   5.960 us |       5.83% |  -0.014 us |  -0.23% |   SAME   |     4.094 us |     1.606 us |  -2.488 us |   -60.77% |    FAST    |
|   F32   |      I64      |      2^20      |   8.038 us |       3.43% |   8.009 us |       3.70% |  -0.028 us |  -0.35% |   SAME   |     4.098 us |     2.870 us |  -1.228 us |   -29.96% |    FAST    |
|   F32   |      I64      |      2^24      |  36.686 us |       0.97% |  38.769 us |       0.86% |   2.083 us |   5.68% |   SLOW   |    32.731 us |    34.859 us |   2.127 us |     6.50% |    SLOW    |
|   F32   |      I64      |      2^28      | 452.862 us |       0.19% | 539.153 us |       0.17% |  86.292 us |  19.05% |   SLOW   |   452.065 us |   535.694 us |  83.629 us |    18.50% |    SLOW    |
|   F64   |      I32      |      2^16      |   5.980 us |       4.76% |   5.963 us |       4.62% |  -0.016 us |  -0.27% |   SAME   |     4.094 us |     1.609 us |  -2.486 us |   -60.71% |    FAST    |
|   F64   |      I32      |      2^20      |  10.062 us |       2.89% |  10.054 us |       2.78% |  -0.009 us |  -0.09% |   SAME   |     6.141 us |     5.194 us |  -0.947 us |   -15.43% |    FAST    |
|   F64   |      I32      |      2^24      |  64.131 us |       1.48% |  71.512 us |       0.48% |   7.381 us |  11.51% |   SLOW   |    59.679 us |    68.255 us |   8.576 us |    14.37% |    SLOW    |
|   F64   |      I32      |      2^28      | 907.488 us |       0.24% |   1.073 ms |       0.05% | 165.704 us |  18.26% |   SLOW   |   904.912 us |     1.070 ms | 164.891 us |    18.22% |    SLOW    |
|   F64   |      I64      |      2^16      |   5.975 us |       4.58% |   5.921 us |       5.35% |  -0.054 us |  -0.90% |   SAME   |     4.094 us |     1.616 us |  -2.478 us |   -60.53% |    FAST    |
|   F64   |      I64      |      2^20      |  10.087 us |       2.88% |  10.049 us |       3.08% |  -0.039 us |  -0.39% |   SAME   |     6.141 us |     5.190 us |  -0.951 us |   -15.48% |    FAST    |
|   F64   |      I64      |      2^24      |  64.740 us |       1.54% |  71.541 us |       0.52% |   6.801 us |  10.50% |   SLOW   |    59.532 us |    68.144 us |   8.612 us |    14.47% |    SLOW    |
|   F64   |      I64      |      2^28      | 907.463 us |       0.25% |   1.073 ms |       0.04% | 165.692 us |  18.26% |   SLOW   |   904.402 us |     1.070 ms | 165.408 us |    18.29% |    SLOW    |
|  I128   |      I32      |      2^16      |   5.976 us |       4.80% |   5.969 us |       5.03% |  -0.007 us |  -0.11% |   SAME   |     4.094 us |     1.596 us |  -2.498 us |   -61.02% |    FAST    |
|  I128   |      I32      |      2^20      |  14.157 us |       1.98% |  14.101 us |       2.22% |  -0.056 us |  -0.40% |   SAME   |     8.188 us |     9.375 us |   1.187 us |    14.50% |    SLOW    |
|  I128   |      I32      |      2^24      | 119.885 us |       0.79% | 138.877 us |       0.45% |  18.992 us |  15.84% |   SLOW   |   116.405 us |   134.968 us |  18.563 us |    15.95% |    SLOW    |
|  I128   |      I32      |      2^28      |   1.812 ms |       0.11% |   2.142 ms |       0.02% | 329.563 us |  18.18% |   SLOW   |     1.806 ms |     2.140 ms | 333.777 us |    18.48% |    SLOW    |
|  I128   |      I64      |      2^16      |   5.983 us |       4.41% |   6.001 us |       5.10% |   0.017 us |   0.29% |   SAME   |     4.094 us |     1.578 us |  -2.516 us |   -61.46% |    FAST    |
|  I128   |      I64      |      2^20      |  14.136 us |       2.12% |  14.144 us |       2.09% |   0.008 us |   0.06% |   SAME   |     8.188 us |     9.380 us |   1.192 us |    14.56% |    SLOW    |
|  I128   |      I64      |      2^24      | 120.039 us |       0.76% | 138.907 us |       0.43% |  18.869 us |  15.72% |   SLOW   |   116.317 us |   134.990 us |  18.673 us |    16.05% |    SLOW    |
|  I128   |      I64      |      2^28      |   1.812 ms |       0.11% |   2.142 ms |       0.03% | 329.903 us |  18.21% |   SLOW   |     1.805 ms |     2.140 ms | 334.969 us |    18.56% |    SLOW    |

# nstream

## [0] NVIDIA B200

|  T{ct}  |  OffsetT{ct}  |  Elements{io}  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |   B Ref Time |   B Cmp Time |     B Diff |   B %Diff |  B Status  |
|---------|---------------|----------------|------------|-------------|------------|-------------|------------|---------|----------|--------------|--------------|------------|-----------|------------|
|   I8    |      I32      |      2^16      |   5.989 us |       4.55% |   5.949 us |       5.01% |  -0.039 us |  -0.66% |   SAME   |     4.094 us |     1.634 us |  -2.460 us |   -60.09% |    FAST    |
|   I8    |      I32      |      2^20      |   6.342 us |       9.26% |   6.419 us |      10.11% |   0.077 us |   1.21% |   SAME   |     4.098 us |     3.117 us |  -0.981 us |   -23.94% |    FAST    |
|   I8    |      I32      |      2^24      |  19.445 us |       4.85% |  19.424 us |       4.68% |  -0.021 us |  -0.11% |   SAME   |    14.343 us |    13.192 us |  -1.151 us |    -8.02% |    FAST    |
|   I8    |      I32      |      2^28      | 212.863 us |       0.10% | 211.458 us |       0.43% |  -1.405 us |  -0.66% |   FAST   |   209.033 us |   205.241 us |  -3.792 us |    -1.81% |    FAST    |
|   I8    |      I64      |      2^16      |   6.009 us |       4.13% |   5.967 us |       4.81% |  -0.042 us |  -0.70% |   SAME   |     4.094 us |     1.645 us |  -2.449 us |   -59.82% |    FAST    |
|   I8    |      I64      |      2^20      |   6.334 us |       8.91% |   6.337 us |       9.49% |   0.003 us |   0.05% |   SAME   |     4.098 us |     3.126 us |  -0.972 us |   -23.71% |    FAST    |
|   I8    |      I64      |      2^24      |  19.865 us |       4.15% |  19.762 us |       4.17% |  -0.102 us |  -0.51% |   SAME   |    14.343 us |    13.214 us |  -1.129 us |    -7.87% |    FAST    |
|   I8    |      I64      |      2^28      | 215.113 us |       0.26% | 214.918 us |       0.08% |  -0.195 us |  -0.09% |   FAST   |   212.031 us |   208.688 us |  -3.342 us |    -1.58% |    FAST    |
|   I16   |      I32      |      2^16      |   6.009 us |       4.15% |   5.982 us |       4.60% |  -0.027 us |  -0.45% |   SAME   |     4.094 us |     1.614 us |  -2.480 us |   -60.58% |    FAST    |
|   I16   |      I32      |      2^20      |   7.932 us |       6.13% |   7.939 us |       5.75% |   0.006 us |   0.08% |   SAME   |     4.098 us |     3.114 us |  -0.984 us |   -24.01% |    FAST    |
|   I16   |      I32      |      2^24      |  26.983 us |       2.80% |  28.431 us |       1.60% |   1.448 us |   5.37% |   SLOW   |    22.510 us |    23.762 us |   1.252 us |     5.56% |    SLOW    |
|   I16   |      I32      |      2^28      | 322.852 us |       0.30% | 362.068 us |       0.20% |  39.215 us |  12.15% |   SLOW   |   319.790 us |   357.594 us |  37.804 us |    11.82% |    SLOW    |
|   I16   |      I64      |      2^16      |   5.987 us |       4.39% |   5.932 us |       4.98% |  -0.055 us |  -0.92% |   SAME   |     4.094 us |     1.619 us |  -2.475 us |   -60.44% |    FAST    |
|   I16   |      I64      |      2^20      |   7.946 us |       6.01% |   7.860 us |       6.95% |  -0.086 us |  -1.08% |   SAME   |     4.098 us |     3.140 us |  -0.958 us |   -23.38% |    FAST    |
|   I16   |      I64      |      2^24      |  27.225 us |       3.23% |  28.478 us |       1.04% |   1.253 us |   4.60% |   SLOW   |    22.522 us |    23.750 us |   1.228 us |     5.45% |    SLOW    |
|   I16   |      I64      |      2^28      | 327.108 us |       0.26% | 361.990 us |       0.22% |  34.882 us |  10.66% |   SLOW   |   324.185 us |   357.615 us |  33.430 us |    10.31% |    SLOW    |
|   F32   |      I32      |      2^16      |   5.977 us |       4.80% |   5.978 us |       4.67% |   0.000 us |   0.01% |   SAME   |     4.094 us |     1.632 us |  -2.462 us |   -60.13% |    FAST    |
|   F32   |      I32      |      2^20      |   8.471 us |       7.81% |   8.566 us |       8.58% |   0.095 us |   1.12% |   SAME   |     4.514 us |     3.815 us |  -0.699 us |   -15.48% |    FAST    |
|   F32   |      I32      |      2^24      |  46.447 us |       1.83% |  49.517 us |       1.50% |   3.070 us |   6.61% |   SLOW   |    41.011 us |    46.008 us |   4.997 us |    12.19% |    SLOW    |
|   F32   |      I32      |      2^28      | 598.549 us |       0.15% | 717.324 us |       0.12% | 118.774 us |  19.84% |   SLOW   |   592.635 us |   713.743 us | 121.108 us |    20.44% |    SLOW    |
|   F32   |      I64      |      2^16      |   5.988 us |       4.55% |   5.959 us |       4.85% |  -0.029 us |  -0.49% |   SAME   |     4.094 us |     1.614 us |  -2.480 us |   -60.57% |    FAST    |
|   F32   |      I64      |      2^20      |   8.514 us |       8.41% |   8.492 us |       8.07% |  -0.022 us |  -0.26% |   SAME   |     4.715 us |     3.826 us |  -0.889 us |   -18.86% |    FAST    |
|   F32   |      I64      |      2^24      |  46.541 us |       1.71% |  49.648 us |       1.55% |   3.107 us |   6.68% |   SLOW   |    41.069 us |    45.971 us |   4.902 us |    11.94% |    SLOW    |
|   F32   |      I64      |      2^28      | 599.451 us |       0.14% | 717.347 us |       0.13% | 117.896 us |  19.67% |   SLOW   |   593.715 us |   713.745 us | 120.029 us |    20.22% |    SLOW    |
|   F64   |      I32      |      2^16      |   5.958 us |       4.65% |   5.965 us |       4.72% |   0.007 us |   0.11% |   SAME   |     4.094 us |     1.625 us |  -2.469 us |   -60.31% |    FAST    |
|   F64   |      I32      |      2^20      |  11.491 us |       7.71% |  11.504 us |       7.48% |   0.013 us |   0.11% |   SAME   |     6.141 us |     5.222 us |  -0.919 us |   -14.96% |    FAST    |
|   F64   |      I32      |      2^24      |  84.027 us |       0.80% |  84.034 us |       0.92% |   0.007 us |   0.01% |   SAME   |    77.789 us |    74.972 us |  -2.817 us |    -3.62% |    FAST    |
|   F64   |      I32      |      2^28      |   1.184 ms |       0.05% |   1.184 ms |       0.05% |   0.029 us |   0.00% |   SAME   |     1.176 ms |     1.173 ms |  -2.920 us |    -0.25% |    SAME    |
|   F64   |      I64      |      2^16      |   5.966 us |       4.90% |   5.963 us |       4.86% |  -0.003 us |  -0.06% |   SAME   |     4.094 us |     1.637 us |  -2.457 us |   -60.02% |    FAST    |
|   F64   |      I64      |      2^20      |  11.513 us |       7.79% |  11.494 us |       7.78% |  -0.019 us |  -0.17% |   SAME   |     6.141 us |     5.216 us |  -0.925 us |   -15.06% |    FAST    |
|   F64   |      I64      |      2^24      |  84.030 us |       0.82% |  84.060 us |       0.76% |   0.030 us |   0.04% |   SAME   |    77.804 us |    74.994 us |  -2.810 us |    -3.61% |    FAST    |
|   F64   |      I64      |      2^28      |   1.184 ms |       0.05% |   1.184 ms |       0.06% |   0.075 us |   0.01% |   SAME   |     1.176 ms |     1.173 ms |  -2.845 us |    -0.24% |    SAME    |
|  I128   |      I32      |      2^16      |   6.066 us |       5.68% |   6.017 us |       5.84% |  -0.049 us |  -0.81% |   SAME   |     4.094 us |     1.592 us |  -2.502 us |   -61.11% |    FAST    |
|  I128   |      I32      |      2^20      |  16.358 us |       2.88% |  16.406 us |       2.93% |   0.048 us |   0.29% |   SAME   |     8.188 us |     9.399 us |   1.211 us |    14.79% |    SLOW    |
|  I128   |      I32      |      2^24      | 156.487 us |       0.62% | 156.500 us |       0.61% |   0.014 us |   0.01% |   SAME   |   150.507 us |   147.953 us |  -2.554 us |    -1.70% |    FAST    |
|  I128   |      I32      |      2^28      |   2.356 ms |       0.04% |   2.356 ms |       0.04% |  -0.046 us |  -0.00% |   SAME   |     2.349 ms |     2.347 ms |  -1.491 us |    -0.06% |    SAME    |
|  I128   |      I64      |      2^16      |   6.084 us |       6.74% |   6.073 us |       6.19% |  -0.010 us |  -0.17% |   SAME   |     4.094 us |     1.607 us |  -2.487 us |   -60.74% |    FAST    |
|  I128   |      I64      |      2^20      |  16.395 us |       2.82% |  16.393 us |       2.69% |  -0.002 us |  -0.01% |   SAME   |     8.188 us |     9.410 us |   1.222 us |    14.92% |    SLOW    |
|  I128   |      I64      |      2^24      | 156.555 us |       0.64% | 156.506 us |       0.62% |  -0.049 us |  -0.03% |   SAME   |   150.525 us |   147.965 us |  -2.560 us |    -1.70% |    FAST    |
|  I128   |      I64      |      2^28      |   2.356 ms |       0.04% |   2.356 ms |       0.04% |  -0.012 us |  -0.00% |   SAME   |     2.349 ms |     2.347 ms |  -1.875 us |    -0.08% |    SAME    |

# Summary

- Total Matches: 160
  - Pass    (diff <= min_noise): 130
  - Unknown (infinite noise):    0
  - Failure (diff > min_noise):  190

The table becomes a bit unwieldy. We could consider dropping the Diff and B Diff columns to improve the situation. Alternatively, we could emit two rows per benchmark.

if has_batch_data:
if (
abs(frac_diff_batch) <= 0.01
): # TODO(bgruber): what value to use here?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea, let's get some input internally on that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just pick a sensible default and let the user override with command-line opts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use min_noise estimated from abs(frac_diff_batch) <= min_noise if available, like we do for cold measurements, and use yellow tint if min_noise is not available.

Perhaps in case when has_batch_data is True, min_noise should always be available.

@alliepiper
Copy link
Collaborator

I like the idea of splitting them to a new line, I think it'd be cleaner.

Or making them into separate tables? That way you could still quickly scan a column to check for outliers. That'd be harder if the timings were alternating cold/batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Show batch time comparisons in nvbench_compare.py

3 participants