Skip to content

common: verbose: asynchronous verbose mode for execution time tracking #3055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

avmanerikar
Copy link
Contributor

@avmanerikar avmanerikar commented Apr 9, 2025

Description

This PR proposes a PoC for introducing an asynchronous verbose mode to accurately track kernel execution times in a non-blocking manner with minimal synchronization latencies. For the verbose mode, retrieving the kernel timing causes significant overhead as it requires the GPU kernel execution to be synchronized and also because it is tracked on the host.
The asynchronous mode removes the synchronization overhead by using event callbacks to query execution timings.
The prototype is created for a OpenCL GPU API that provides the kernel execution statistics for profiling.

The implementation enabled at run-time with DNNL_ASYNC_VERBOSE=1:

DNNL_VERBOSE=profile_exec DNNL_ASYNC_VERBOSE=1 ./examples/primitives-matmul-cpp gpu

Related RFC: [link]

Addresses MFDNN-13603.

Checklist

  • Have you published an RFC for the new feature?
  • Was the RFC approved?
  • Have you added relevant tests?

@avmanerikar avmanerikar requested review from a team as code owners April 9, 2025 17:50
@github-actions github-actions bot added documentation A request to change/fix/improve the documentation. Codeowner: @oneapi-src/onednn-doc platform:gpu-generic Codeowner: @oneapi-src/onednn-gpu-generic component:api Codeowner: @oneapi-src/onednn-arch component:build labels Apr 9, 2025
@avmanerikar avmanerikar marked this pull request as draft April 9, 2025 17:52
@avmanerikar avmanerikar force-pushed the amanerik/main/async-verbose-mode branch from 25b0638 to bf1e8d1 Compare April 9, 2025 18:00
@avmanerikar avmanerikar force-pushed the amanerik/main/async-verbose-mode branch from bf1e8d1 to 625eec4 Compare April 28, 2025 17:45
@github-actions github-actions bot added platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel component:common labels Apr 28, 2025
@avmanerikar avmanerikar force-pushed the amanerik/main/async-verbose-mode branch from 625eec4 to e69f76d Compare May 12, 2025 17:26
@github-actions github-actions bot removed the platform:gpu-generic Codeowner: @oneapi-src/onednn-gpu-generic label May 12, 2025
@avmanerikar avmanerikar force-pushed the amanerik/main/async-verbose-mode branch 2 times, most recently from c834a20 to dc4f76d Compare May 27, 2025 17:09
@avmanerikar avmanerikar changed the title [WIP] common: verbose: asynchronous verbose mode for execution time tracking common: verbose: asynchronous verbose mode for execution time tracking May 27, 2025
@avmanerikar avmanerikar marked this pull request as ready for review May 27, 2025 19:18
@avmanerikar avmanerikar requested a review from a team as a code owner May 27, 2025 19:18
return status::success;

} else {
cl_int err = clWaitForEvents(1, &async_tracked_event_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this call synchronous? We enqueue a kernel, record an event and execution blocks here, until the kernel finishes. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This was a fallback for failure cases where the verbose info is then printed with the default stream.wait() calls. The implementation has been updated to avoid repetition.

@avmanerikar avmanerikar marked this pull request as draft May 29, 2025 20:17
@avmanerikar avmanerikar force-pushed the amanerik/main/async-verbose-mode branch from dc4f76d to 8051a73 Compare June 4, 2025 18:42
@avmanerikar avmanerikar force-pushed the amanerik/main/async-verbose-mode branch from 8051a73 to ad7be6d Compare June 16, 2025 17:47
@avmanerikar avmanerikar marked this pull request as ready for review June 16, 2025 17:52
@avmanerikar avmanerikar force-pushed the amanerik/main/async-verbose-mode branch from ad7be6d to c9a7d45 Compare June 25, 2025 18:32
@github-actions github-actions bot removed the documentation A request to change/fix/improve the documentation. Codeowner: @oneapi-src/onednn-doc label Jun 25, 2025
@github-actions github-actions bot removed component:api Codeowner: @oneapi-src/onednn-arch component:build labels Jun 25, 2025
@avmanerikar avmanerikar force-pushed the amanerik/main/async-verbose-mode branch from c9a7d45 to 4513647 Compare July 2, 2025 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:common platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants