Skip to content

[Rocprofiler-sdk] All AMD memcpy events are being placed on stream 0 #1351

@ryanzhang22

Description

@ryanzhang22

Looking at the code in RocprofLogger.cpp, we are hardcoding stream = 0 for all events of type ROCPROFILER_BUFFER_TRACING_MEMORY_COPY.

https://github.com/pytorch/kineto/blob/main/libkineto/src/RocprofLogger.cpp#L727

We do this because rocprofiler_buffer_tracing_memory_copy_record_t does not provide a queue_id field. This causes all AMD memcpy events to be placed on stream 0, no matter what stream it was intended for.

Image

We can hack around the issue in post-processing by using the correlation id to track down the CPU-side launch event and fetch the hipStream_t, then using kernel launch events to link hipStream_t with queue_id, but this is a roundabout way to get this info. Ideally rocprofiler_buffer_tracing_memory_copy_record_t would provide it directly.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions