-
Notifications
You must be signed in to change notification settings - Fork 91
Speed up oneapi_memcpy2d tests by avoiding redundant setup #1159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up oneapi_memcpy2d tests by avoiding redundant setup #1159
Conversation
Due to the way the Catch2 framework interacts with `for_all_combinations` in the SYCL-CTS, we were inadvertently re-running setup code (allocation, fill, deallocation) for each <T, src_ptr_type, dest_ptr_type> combination in the oneapi_memcpy2d tests. Add a warning to the documentation of `for_all_combinations` to highlight this perfomance pitfall, and wrap the test code in an outer SECTION to avoid re-executing expensive setup operations on every test case re-execution. On my local machine with an Intel GPU, the runtime of the test changes from 14s to 2.7s. When running on an internal simulator it takes the test from more than 20 minutes (I killed it after that) to 9 minutes. Disclaimer: I used a generative AI model to adjust the phrasing of the comments, but I reviewed every line of output.
|
@bader @tomdeakin @keryell Can I get a review on this PR please :) ? |
bader
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am affiliated with Intel as you, so my vote doesn't count.
That's the reason I don't approve Intel contributions to SYCL-CTS, except tests for Intel extensions and non-CTS related areas like CI.
At the same time, I review all Intel PRs and comment only when I have something to say.
This PR looks good to me. 👍
|
LGTM thanks! |
Thanks! Honestly I was not aware that's how the proces works here. If you can link to a summary or send me a message about it, I'd appreciate it a lot :) |
|
Some rules are covered by the documentation. See https://github.com/KhronosGroup/SYCL-CTS/tree/main/docs#pull-requests. |
keryell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
WG approved to merge. |
Due to the way the Catch2 framework interacts with
for_all_combinationsin the SYCL-CTS, we were inadvertently re-running setup code (allocation, fill, deallocation)for each <T, src_ptr_type, dest_ptr_type> combination in the oneapi_memcpy2d tests.
Add a warning to the documentation of
for_all_combinationsto highlight this perfomance pitfall, and wrap the test code in an outer SECTION to avoid re-executing expensive setup operations on every test case re-execution.On my local machine with an Intel GPU, the runtime of the test changes from 14s to 2.7s.
When running on an internal simulator it takes the test from more than 20 minutes (I killed it after that) to 9 minutes.
Disclaimer: I used a generative AI model to adjust the phrasing of the comments, but I reviewed every line of output.