Skip to content

Torch Profiler Shows Zero Tensor Core Utilization for torch.nn.Conv3d, While Nsight Compute Confirms Usage #1041

Open
@BurkeHulk

Description

@BurkeHulk

Description

I profiled torch.nn.Conv3d using both PyTorch's built-in profiler and Nsight Compute. When viewing the results in TensorBoard, the PyTorch profiler reports zero Tensor Core utilization. However, Nsight Compute indicates that Tensor Cores are actually being used.

Upon investigating the codebase, I found that the Tensor Core allowlist (TC_Allowlist) in [tb_plugin/torch_tb_profiler/profiler/tensor_core.py](https://github.com/pytorch/kineto/blob/main/tb_plugin/torch_tb_profiler/profiler/tensor_core.py) appears to be outdated.

The kernel used in Conv3d is:

sm90_xmma_fprop_implicit_gemm_bf16bf16_bf16f32_f32_nhwckrsc_nhwc_tilesize128x128x64_warpgroupsize1x1x1_g1_execute_segment_k_off_kernel__5x_cudnn

However, xmma_fprop_implicit_gemm is not included in the allowlist, which might explain the discrepancy.

Expected Behavior

PyTorch's profiler using tensorboard should correctly report Tensor Core utilization when kernels that use Tensor Cores are executed.

Suggested Fix

The allowlist should be updated to include xmma_fprop_implicit_gemm and other relevant kernels.

Environment

  • PyTorch Version: 2.6.0+cu124
  • CUDA Version: 12.4
  • GPU: NVIDIA H200
  • Profiling Tools: PyTorch Profiler, Nsight Compute (2024.1.1.0 (build 33998838))
  • torch-tb-profiler: 0.4.3

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    pluginPyTorch Profiler TensorBoard Plugin related

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

      Participants

      @davidberard98@BurkeHulk

      Issue actions

        Torch Profiler Shows Zero Tensor Core Utilization for torch.nn.Conv3d, While Nsight Compute Confirms Usage · Issue #1041 · pytorch/kineto