Skip to content

Conversation

@Maetveis
Copy link
Contributor

Due to the way the Catch2 framework interacts with for_all_combinations in the SYCL-CTS, we were inadvertently re-running setup code (allocation, fill, deallocation)
for each <T, src_ptr_type, dest_ptr_type> combination in the oneapi_memcpy2d tests.

Add a warning to the documentation of for_all_combinations to highlight this perfomance pitfall, and wrap the test code in an outer SECTION to avoid re-executing expensive setup operations on every test case re-execution.

On my local machine with an Intel GPU, the runtime of the test changes from 14s to 2.7s.
When running on an internal simulator it takes the test from more than 20 minutes (I killed it after that) to 9 minutes.

Disclaimer: I used a generative AI model to adjust the phrasing of the comments, but I reviewed every line of output.

Due to the way the Catch2 framework interacts with `for_all_combinations`
in the SYCL-CTS, we were inadvertently re-running setup code
(allocation, fill, deallocation)
for each <T, src_ptr_type, dest_ptr_type> combination in the
oneapi_memcpy2d tests.

Add a warning to the documentation of `for_all_combinations` to highlight this
perfomance pitfall, and wrap the test code in an outer SECTION to avoid
re-executing expensive setup operations on every test case re-execution.

On my local machine with an Intel GPU, the runtime of the test
changes from 14s to 2.7s.
When running on an internal simulator it takes the test from more than
20 minutes (I killed it after that) to 9 minutes.

Disclaimer: I used a generative AI model to adjust the phrasing of the
comments, but I reviewed every line of output.
@Maetveis Maetveis requested a review from a team as a code owner December 11, 2025 12:33
@Maetveis
Copy link
Contributor Author

@bader @tomdeakin @keryell Can I get a review on this PR please :) ?

Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am affiliated with Intel as you, so my vote doesn't count.
That's the reason I don't approve Intel contributions to SYCL-CTS, except tests for Intel extensions and non-CTS related areas like CI.
At the same time, I review all Intel PRs and comment only when I have something to say.
This PR looks good to me. 👍

@TApplencourt
Copy link
Contributor

LGTM thanks!

@Maetveis
Copy link
Contributor Author

I am affiliated with Intel as you, so my vote doesn't count. That's the reason I don't approve Intel contributions to SYCL-CTS, except tests for Intel extensions and non-CTS related areas like CI. At the same time, I review all Intel PRs and comment only when I have something to say. This PR looks good to me. 👍

Thanks! Honestly I was not aware that's how the proces works here. If you can link to a summary or send me a message about it, I'd appreciate it a lot :)

@bader
Copy link
Contributor

bader commented Dec 17, 2025

Some rules are covered by the documentation. See https://github.com/KhronosGroup/SYCL-CTS/tree/main/docs#pull-requests.

Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@tomdeakin
Copy link
Contributor

WG approved to merge.

@tomdeakin tomdeakin merged commit a680b5e into KhronosGroup:main Dec 18, 2025
9 checks passed
@Maetveis Maetveis deleted the speed_up_test_oneapi_memcpy2d branch December 19, 2025 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants