Description
Description
I profiled torch.nn.Conv3d
using both PyTorch's built-in profiler and Nsight Compute. When viewing the results in TensorBoard, the PyTorch profiler reports zero Tensor Core utilization. However, Nsight Compute indicates that Tensor Cores are actually being used.
Upon investigating the codebase, I found that the Tensor Core allowlist (TC_Allowlist
) in [tb_plugin/torch_tb_profiler/profiler/tensor_core.py](https://github.com/pytorch/kineto/blob/main/tb_plugin/torch_tb_profiler/profiler/tensor_core.py)
appears to be outdated.
The kernel used in Conv3d
is:
sm90_xmma_fprop_implicit_gemm_bf16bf16_bf16f32_f32_nhwckrsc_nhwc_tilesize128x128x64_warpgroupsize1x1x1_g1_execute_segment_k_off_kernel__5x_cudnn
However, xmma_fprop_implicit_gemm
is not included in the allowlist, which might explain the discrepancy.
Expected Behavior
PyTorch's profiler using tensorboard should correctly report Tensor Core utilization when kernels that use Tensor Cores are executed.
Suggested Fix
The allowlist should be updated to include xmma_fprop_implicit_gemm
and other relevant kernels.
Environment
- PyTorch Version: 2.6.0+cu124
- CUDA Version: 12.4
- GPU: NVIDIA H200
- Profiling Tools: PyTorch Profiler, Nsight Compute (2024.1.1.0 (build 33998838))
- torch-tb-profiler: 0.4.3
Activity