Skip to content

[Issue]: Roctracer returns correlation_id of 0 for all communication kernels #100

Closed
@sraikund16

Description

Problem Description

When profiling, we observe that the activity_record_t/roctracer_record_t objects for communication kernels all have a correlation_id of 0. For example, we see CPU event hipExtLaunchKernel with correlation 29170; however, its corresponding GPU kernel, ncclDevKernel_Generic(ncclDevComm*, channelMasks, ncclWork*), has correlation of 0. We see that for non-CCL events, the correlation_id of the CPU and GPU events do match despite using the same method of getting correlation_id as CCL events.

We obtain the correlation_ids for all async roctracer activities in kineto within this callback: https://github.com/pytorch/kineto/blob/main/libkineto/src/RoctracerLogger.cpp#L295

Thanks in advance!

Operating System

CentOS Stream 9

CPU

AMD EPYC 7713

GPU

AMD Instinct MI300X

ROCm Version

6.1.0.60100-82

ROCm Component

roctracer

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions