Description
Problem Description
When profiling, we observe that the activity_record_t/roctracer_record_t objects for communication kernels all have a correlation_id of 0. For example, we see CPU event hipExtLaunchKernel
with correlation 29170; however, its corresponding GPU kernel, ncclDevKernel_Generic(ncclDevComm*, channelMasks, ncclWork*)
, has correlation of 0. We see that for non-CCL events, the correlation_id of the CPU and GPU events do match despite using the same method of getting correlation_id as CCL events.
We obtain the correlation_ids for all async roctracer activities in kineto within this callback: https://github.com/pytorch/kineto/blob/main/libkineto/src/RoctracerLogger.cpp#L295
Thanks in advance!
Operating System
CentOS Stream 9
CPU
AMD EPYC 7713
GPU
AMD Instinct MI300X
ROCm Version
6.1.0.60100-82
ROCm Component
roctracer
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response