Looking at the code in RocprofLogger.cpp, we are hardcoding stream = 0 for all events of type ROCPROFILER_BUFFER_TRACING_MEMORY_COPY.
https://github.com/pytorch/kineto/blob/main/libkineto/src/RocprofLogger.cpp#L727
We do this because rocprofiler_buffer_tracing_memory_copy_record_t does not provide a queue_id field. This causes all AMD memcpy events to be placed on stream 0, no matter what stream it was intended for.
We can hack around the issue in post-processing by using the correlation id to track down the CPU-side launch event and fetch the hipStream_t, then using kernel launch events to link hipStream_t with queue_id, but this is a roundabout way to get this info. Ideally rocprofiler_buffer_tracing_memory_copy_record_t would provide it directly.
Looking at the code in
RocprofLogger.cpp, we are hardcoding stream = 0 for all events of typeROCPROFILER_BUFFER_TRACING_MEMORY_COPY.https://github.com/pytorch/kineto/blob/main/libkineto/src/RocprofLogger.cpp#L727
We do this because
rocprofiler_buffer_tracing_memory_copy_record_tdoes not provide aqueue_idfield. This causes all AMD memcpy events to be placed on stream 0, no matter what stream it was intended for.We can hack around the issue in post-processing by using the correlation id to track down the CPU-side launch event and fetch the hipStream_t, then using kernel launch events to link hipStream_t with queue_id, but this is a roundabout way to get this info. Ideally
rocprofiler_buffer_tracing_memory_copy_record_twould provide it directly.