Skip to content

Enable ocltst execution on Rock#3590

Draft
agunashe wants to merge 48 commits intomainfrom
users/agunashe/run_ocltst_rock
Draft

Enable ocltst execution on Rock#3590
agunashe wants to merge 48 commits intomainfrom
users/agunashe/run_ocltst_rock

Conversation

@agunashe
Copy link
Copy Markdown
Contributor

@agunashe agunashe commented Feb 24, 2026

Motivation

Enable ocltst execution on Rock builds on both linux and windows.

Technical Details

enable ocltst execution on linux and windows.

Setup to pick the correct runtime libs
export OCL_ICD_VENDORS=/opt/rocm/etc/OpenCL/vendors/
Linux -
setting LD_LIBRARY_PATH to find the libs
Windows -
choosing to copy the runtime/icd dlls to local folder instead of regsitry key update.
Registry key update requires admin permissions so using copy mechanism to pick the right dlls
else the system32 dlls will be loaded, leading to incorrect dlls loaded at runtime

Test Plan

All opencl tests executed and working

Test Result

All opencl tests executed and working

JIRA

AIRUNTIME-267

Submission Checklist

Comment thread build_tools/github_actions/fetch_test_configurations.py
@agunashe agunashe force-pushed the users/agunashe/run_ocltst_rock branch from 492ecc5 to e59134e Compare March 4, 2026 00:16
@agunashe agunashe added gfx1151 Issue/PR relates to gfx1151. test:rccl For pull requests, runs full tests for only rccl and other labeled projects. test:ocltst For pull requests, this label executes the ocltst and other labeled projects. and removed test:rccl For pull requests, runs full tests for only rccl and other labeled projects. labels Mar 4, 2026
@agunashe agunashe force-pushed the users/agunashe/run_ocltst_rock branch 2 times, most recently from 3af7c34 to f187874 Compare March 6, 2026 23:52
@agunashe agunashe requested a review from a team as a code owner March 6, 2026 23:52
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geomin12 can you review the test script?

@marbre marbre requested a review from geomin12 March 9, 2026 12:45
Comment thread build_tools/github_actions/fetch_test_configurations.py Outdated
Comment thread rocm-systems
Comment thread build_tools/github_actions/fetch_test_configurations.py Outdated
Comment thread build_tools/github_actions/test_executable_scripts/test_ocltst.py Outdated
Comment thread build_tools/github_actions/test_executable_scripts/test_ocltst.py Outdated
Comment thread build_tools/github_actions/test_executable_scripts/test_ocltst.py Outdated
env["OCL_ICD_VENDORS"] = f"{OCL_ICD_VENDORS}/"


def execute_tests(env):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these tests have a smoke test filter? something like this: https://github.com/ROCm/TheRock/blob/main/docs/development/test_filtering.md

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as of now, there is no smoke tests for this test.
this is already a shortened test. Maybe we can look at adding smoke tests in the future, needs more work identifying and trimming the tests

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes let's open tix for filtering, as we probably want to do expanded testing later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4192 - added to this issue

"ocltst": {
"job_name": "ocltst",
"fetch_artifact_args": "--tests",
"timeout_minutes": 120,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from this, it looks like tests takes a total of 8 hours.... that's a pretty insane amount of time, especially for a developer waiting on PR checks.

are there any filters we can use? we are getting teams to start using this format: https://github.com/ROCm/TheRock/blob/main/docs/development/test_filtering.md so we get fast turn around times with the machines we have

Copy link
Copy Markdown
Contributor Author

@agunashe agunashe Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduced it to 20 mins. that should be enough to verify

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually needed 30 mins of timeout

@agunashe agunashe marked this pull request as ready for review March 26, 2026 06:45
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed review notes: https://github.com/ScottTodd/claude-rocm-workspace/blob/main/reviews/pr_TheRock_3590.md (this was generated via some back and forth review of the PR, looking at logs, etc.). I expect this level of analysis from PR authors too, so a PR description of just "Technical Details: enable ocltst execution on linux and windows" is insufficient.

I've added one inline comment for the major architectural issue. I'll wait to comment on the smaller details until that is resolved.

Comment on lines +23 to +65
# copies the dlls to local ocltst path.
# to overwrite the registry entries
def copy_dlls_exe_path(ocltst_path):
if platform.system() == "Windows":
# hip and comgr dlls need to be copied to the same folder as exectuable
dlls_pattern = ["amdocl*.dll", "amd_comgr*.dll", "OpenCL.dll"]
dlls_to_copy = []
for pattern in dlls_pattern:
dlls_to_copy.extend(THEROCK_BIN_DIR.glob(pattern))
for dll in dlls_to_copy:
try:
shutil.copy(dll, ocltst_path)
logging.info(f"++ Copied: {dll} to {ocltst_path}")
except Exception as e:
logging.info(f"++ Error copying {dll}: {e}")


# returns ocltst path
def setup_env(env):
ROCM_PATH = Path(THEROCK_DIR)
env["ROCM_PATH"] = str(ROCM_PATH)
if not is_windows:
ROCK_LIB_PATH = Path(THEROCK_DIR) / "lib"
OCL_LIB = Path(ROCK_LIB_PATH) / "opencl"
LLVM_LIB = Path(ROCK_LIB_PATH) / "llvm" / "lib"
ROCM_SYSDEPS_LIB = Path(ROCK_LIB_PATH) / "rocm_sysdeps" / "lib"
OCL_ICD_VENDORS = Path(THEROCK_DIR) / "etc" / "OpenCL" / "vendors"
OCLTST_PATH = Path(THEROCK_DIR) / "share" / "opencl" / "ocltst"
LD_LIBRARY_PATH = os.getenv("LD_LIBRARY_PATH")
if LD_LIBRARY_PATH is not None:
LD_LIBRARY_PATH = Path(LD_LIBRARY_PATH)
env["LD_LIBRARY_PATH"] = (
f"{ROCK_LIB_PATH}:{OCL_LIB}:{LLVM_LIB}:{ROCM_SYSDEPS_LIB}:{LD_LIBRARY_PATH}:{OCLTST_PATH}"
)
env["OCL_ICD_VENDORS"] = f"{OCL_ICD_VENDORS}/"
else:

OCLTST_PATH = Path(THEROCK_DIR) / "tests" / "ocltst"
copy_dlls_exe_path(OCLTST_PATH)
OCL_DLL_FILE = Path(OCLTST_PATH) / "amdocl64.dll"
OCL_ICD_DLL = Path(THEROCK_BIN_DIR) / "OpenCL.dll"
env["OCL_ICD_FILENAMES"] = str(OCL_DLL_FILE)
return OCLTST_PATH
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can (and should) remove nearly all of this setup code from this test script if we fix https://github.com/ROCm/rocm-systems/blob/be8076461de7f44e21a0f273d94f5872c89c1e17/projects/clr/opencl/tests/ocltst/CMakeLists.txt#L7-L11 to install ocltst into a standard install location like bin/ instead of a new location like tests/.

See for example what hipblaslt does:

Or a more recent / simpler case like dnn-providers/hipblaslt-provider: https://github.com/ROCm/rocm-libraries/blob/94453c6cb85db167e96e3f463a12d808734674b5/dnn-providers/hipblaslt-provider/cmake/Tests.cmake#L180-L192

The guiding principles are this:

  • Tests should be runnable from a build directory (e.g. via ctest)
  • Component development should use the same testing workflow as CI systems
  • Scripts for CI like these in build_tools/github_actions/test_executable_scripts/ should be thin wrappers around test runners / executables that possibly add filtering and sharding options applicable to CI. They should NOT be structurally modifying the artifact directory in any way. We granted a narrow exception for that to hip-tests in Enable hip-tests on TheRock #2001 but if this is becoming a pattern we need to fix it and not dig the whole deeper

See more detailed analysis of the issue and recommendations to fixing it here: https://github.com/ScottTodd/claude-rocm-workspace/blob/main/reviews/pr_TheRock_3590.md#1-install-to-bin-to-eliminate-dll-copy-and-ld_library_path-workarounds

I ran local experiments with the artifacts produced by this PR here: https://github.com/ScottTodd/claude-rocm-workspace/blob/main/reviews/pr_TheRock_3590.md#local-experiment-install-to-bin-validation

  • Moving files to bin/, tests passed if I ran without setting OCL_ICD_FILENAMES (so using my system OpenCL ICD version 3652.0)
  • Moving files to bin/, tests failed with OCL_ICD_FILENAMES (using OpenCL ICD version 3628.0...?) I'm not sure why, given that tests passed on CI

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment. I agree we should not have different installation paths for Linux vs windows.
However, changing the installation paths in clr could impact compute build.
So changes will be required in rocm-systems, compute could take a couple of weeks to get this merged again.
this coverage is important piece to validate Opencl and move away from legacy build systems.
Also earlier the linux paths were a requirement from packaging team to install the artifacts in a certain folder path.
We need similar requirements now with Rock which covers both linux and windows
I want to come back and clean up the paths/artifacts through a ticket/issue instead of blocking this PR.
Delaying this further risks test coverage gap in 7.13 too.

Looking at your experiments-
exp 1 - invalid, uses system32 opencl and comgr dlls for loading. Registry key takes precedence. Which is incorrect, need the dlls from the rock build to be loaded
exp 2 - partially valid, setting OCL_ICD_FILENAMES=$(pwd)/amdocl64.dll picks one dll from local folder but does not find all the dlls. So it fails
exp 3 - invalid again. same as exp1. use dlls from system32 instead of local build

To pick the right dlls to be loaded at runtime . Use Listdlls app to see the dlls loaded at runtime.
opt1 - We need to update registry key to point the amdocl64
or
opt2 - copy the required dlls to local folder of exe.

opt1 did not work, because we do not have any facility in Rock to update the registry key. Updating registry key requires admin permissions so could not set it through the test_ocltst.py.
choosing opt2 - With the interest of time, chose 2. this way local folder always gets precedence over system32 dlls

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, changing the installation paths in clr could impact compute build.

Impacts on internal builds are not generally a blocking concern for open source code, builds, or releases. The costs for living downstream are paid by downstream projects, not upstream.

I see the points about timelines, but so far I'm seeing more tech debt being accumulated than paid down (I tried to run hip-tests the other day and was completely unable to do so, facing similar issues as on here... these "scripts that only work on CI" need to be fixed)

Also earlier the linux paths were a requirement from packaging team to install the artifacts in a certain folder path.
We need similar requirements now with Rock which covers both linux and window

As far as I can tell, only rocgdb (also added recently) writes to a top level tests/ folder. All other ROCm subprojects use other paths. So this is already in "needs a requirement from packaging" territory. Using the already established layout seems like it would be the path of least resistance here to me, especially given that the tests/ path does not work (we're not going to ask users who download our packages to copy DLLs to run tests... they should be able to run the tests as easily as developers working on the projects and CI/CD systems can).

exp 2 - partially valid, setting OCL_ICD_FILENAMES=$(pwd)/amdocl64.dll picks one dll from local folder but does not find all the dlls. So it fails

Which am I missing? I have everything that this PR tries to copy and I don't see any missing DLLs when I run the various exes and dlls through https://github.com/lucasg/dependencies

λ ls D:\scratch\claude\artifacts\3590-win\bin
AMD.ROCM.Comgr.MANIFEST  clinfo.exe*   hipcc.exe*      oclperf.dll*     oclruntime.exclude
amd_comgr0702.dll*       cltrace.dll*  hipconfig.bat   oclperf.exclude  ocltst.exe*
amdocl64.dll*

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually looking at your exp setup
exp 1 - since you copy the test items to bin. runtime dlls picked are valid but system32 Opencl.dll is loaded
exp 2 - assuming items are copied from bin to test . setting OCL_ICD_FILENAMES forces to pick local OpenCL.dll. not sure what is causing this failure. Cant repro it locally.
exp 3 - running from tests folder without copying dlls to local folder picks the system32 dlls. So this is invalid.

Copy link
Copy Markdown
Contributor Author

@agunashe agunashe Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impacts on internal builds are not generally a blocking concern for open source code, builds, or releases. The costs for living downstream are paid by downstream projects, not upstream.

I agree with this if its a simple bug fix, upstream gets priority. But currently we are still holding the legacy/downstream systems because of the test coverage gap in Rock. Once we have enabled the tests on Rock, we can look at fixing the issues, atleast then we have a gate check that verifies our change. Right now the opencl changes are not tested in Rock, this causes issues to be caught later in the QA cycle.

I see the points about timelines, but so far I'm seeing more tech debt being accumulated than paid down (I tried to run hip-tests the other day and was completely unable to do so, facing similar issues as on here... these "scripts that only work on CI" need to be fixed)

Please file an issue assign to our team, we will work on it soon. It should work outside CI as well.

As far as I can tell, only rocgdb (also added recently) writes to a top level tests/ folder. All other ROCm subprojects use other paths. So this is already in "needs a requirement from packaging" territory. Using the already established layout seems like it would be the path of least resistance here to me, especially given that the tests/ path does not work (we're not going to ask users who download our packages to copy DLLs to run tests... they should be able to run the tests as easily as developers working on the projects and CI/CD systems can).

ocltst is a test binary. I am not sure if its right thing to copy this to bin folder. sorry need a proper requirement and design.

also exp2 should work, I am not able to repro it locally.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can provide another LIMITED, TEMPORARY exception for these sorts of DLL copying workarounds in tests as part of adding ocltst coverage to TheRock but this needs to be fixed urgently as it impacts core developer workflows and shipping package structures.

We're still recovering from issues where framework (JAX) automated tests, QA processes, and packages were completely misaligned to the point where we were silently falling back to testing on CPU on CI, QA had their own undocumented test procedures, and the end result were user-facing packages that were fundamentally broken.

This has all the same smells to it. We ship these test files as part of releases. Users/developers/etc. can and will try to run the test files in those release archives. They will not work as-is. Users/developers/etc. will not think to find a script like this or copy DLLs / set environment variables themselves.

Tracking follow-ups in #4192.

@agunashe
Copy link
Copy Markdown
Contributor Author

Detailed review notes: https://github.com/ScottTodd/claude-rocm-workspace/blob/main/reviews/pr_TheRock_3590.md (this was generated via some back and forth review of the PR, looking at logs, etc.). I expect this level of analysis from PR authors too, so a PR description of just "Technical Details: enable ocltst execution on linux and windows" is insufficient.

I've added one inline comment for the major architectural issue. I'll wait to comment on the smaller details until that is resolved.

Added more information to the technical details.

@agunashe agunashe requested a review from ScottTodd March 26, 2026 19:44
Comment thread build_tools/github_actions/test_executable_scripts/test_ocltst.py Outdated
Comment thread build_tools/github_actions/test_executable_scripts/test_ocltst.py Outdated
Comment thread build_tools/github_actions/test_executable_scripts/test_ocltst.py Outdated
"-m",
"oclruntime.dll",
"-A",
"oclruntime.exclude",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any options for parallelization? can we add a comment on what ./ocltst does? it seems to be a custom script so it's unclear to us what is under the hood. is it ctest/gtest/pytest?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its a custom test with c/c++ calls no test framework used. Added 16 threads , think they run parallely. Need to check

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to have reduced the execution time to around 17 mins.

Comment thread build_tools/github_actions/test_executable_scripts/test_ocltst.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30 mins is long. any options for parallelization? having a GPU machine run 30 mins for quick/sanity checks is really long and costly

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have added 16 threads for execution. Lets see if the execution time comes down

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, 16 mins is better, but let's open an ocl test ticket to introduce filtering

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 mins is okay now, but let's introduce an internal ticket for ocl test filtering

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4192 - can create a internal ticket for this as well

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add that GH issue to the code here, so we don't forget

Comment thread build_tools/github_actions/fetch_test_configurations.py Outdated
Copy link
Copy Markdown
Contributor

@amd-shiraz amd-shiraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we are adding tests for new components as part of this PR? Before we do that, do we have a report on the below:

  1. are are introducing any flaky tests / timeouts on tests as part of this that will impact our ci production pipelines/nightlies ?
  2. how much resources these new tests need to successfully execute all steps ?
  3. are we adding any extra time as part of this ? if yes, how much ?

@agunashe
Copy link
Copy Markdown
Contributor Author

I believe we are adding tests for new components as part of this PR? Before we do that, do we have a report on the below:

  1. are are introducing any flaky tests / timeouts on tests as part of this that will impact our ci production pipelines/nightlies ?
  2. how much resources these new tests need to successfully execute all steps ?
  3. are we adding any extra time as part of this ? if yes, how much ?

This is a coverage gap in the Rock. We are porting it from existing build framework. this test has been from years, mostly stable. From the tests we observed around 30 mins to complete the tests. I have added parallel threads for execution. Hopefully it should take less time

@agunashe agunashe requested review from amd-shiraz and geomin12 March 30, 2026 15:24
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 mins is okay now, but let's introduce an internal ticket for ocl test filtering

env["OCL_ICD_VENDORS"] = f"{OCL_ICD_VENDORS}/"


def execute_tests(env):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes let's open tix for filtering, as we probably want to do expanded testing later

# copies the dlls to local ocltst path.
# to overwrite the registry entries
def copy_dlls_exe_path(ocltst_path):
if platform.system() == "Windows":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use is_windows

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer to fix it in the clean up. We need this PR merged ASAP for ocl coverage.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a 1 line change, in the code this PR is directly introducing, immediately prior in the code. Fix it please.

@geomin12 @amd-shiraz please don't approve PRs that so obviously are ignoring reviewer feedback.

Comment thread build_tools/github_actions/test_executable_scripts/test_ocltst.py
@agunashe agunashe requested a review from geomin12 March 31, 2026 04:47
Copy link
Copy Markdown
Contributor

@geomin12 geomin12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, let's use #4192 and get this cleaned up after this lands.

16 mins is long for "quick" and causes a waste of resources / developer time - noted in test_filtering.md.

soon, we'll start having global timeouts and if test times are not within the doc above, these tests will fail (or we may just disable the test if takes too long) - so please priotizing cleaning up!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add that GH issue to the code here, so we don't forget

@agunashe
Copy link
Copy Markdown
Contributor Author

agunashe commented Mar 31, 2026

I believe we are adding tests for new components as part of this PR? Before we do that, do we have a report on the below:

  1. are are introducing any flaky tests / timeouts on tests as part of this that will impact our ci production pipelines/nightlies ?
  2. how much resources these new tests need to successfully execute all steps ?
  3. are we adding any extra time as part of this ? if yes, how much ?

This is a coverage gap in the Rock. We are porting it from existing build framework. this test has been from years, mostly stable. From the tests we observed around 30 mins to complete the tests. I have added parallel threads for execution. Hopefully it should take less time

@amd-shiraz - if you are satisfied with this PR and the explanation. Can you please review/approve this so we can merge and close the gap please.

# copies the dlls to local ocltst path.
# to overwrite the registry entries
def copy_dlls_exe_path(ocltst_path):
if platform.system() == "Windows":
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a 1 line change, in the code this PR is directly introducing, immediately prior in the code. Fix it please.

@geomin12 @amd-shiraz please don't approve PRs that so obviously are ignoring reviewer feedback.

Comment on lines +42 to +43
ROCM_PATH = Path(THEROCK_DIR)
env["ROCM_PATH"] = str(ROCM_PATH)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This intermediate ROCM_PATH variable is doing nothing. THEROCK_DIR is already of type Path

Comment on lines +45 to +50
ROCK_LIB_PATH = Path(THEROCK_DIR) / "lib"
OCL_LIB = Path(ROCK_LIB_PATH) / "opencl"
LLVM_LIB = Path(ROCK_LIB_PATH) / "llvm" / "lib"
ROCM_SYSDEPS_LIB = Path(ROCK_LIB_PATH) / "rocm_sysdeps" / "lib"
OCL_ICD_VENDORS = Path(THEROCK_DIR) / "etc" / "OpenCL" / "vendors"
OCLTST_PATH = Path(THEROCK_DIR) / "share" / "opencl" / "ocltst"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables are all already of type Path, no need to wrap in Path()

Comment on lines +448 to +451
if args.ocltst:
# ocltst binaries ship in the core-ocl test artifact (base already includes _lib/_dev).
if args.tests:
argv.append("core-ocl_test")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't quite follow existing patterns here.

There is already a code block below that handles _test artifacts:

        # Fetch _lib (always) and _test (when --tests) for each artifact.
        # Some projects have self-contained _test archives (just test
        # binaries), while others may also need executables or data from
        # _run. Add those explicitly above via argv.append("<name>_run").
        extra_artifact_patterns = [f"{a}_lib" for a in extra_artifacts]
        if args.tests:
            extra_artifact_patterns.extend([f"{a}_test" for a in extra_artifacts])

This can probably just do:

Suggested change
if args.ocltst:
# ocltst binaries ship in the core-ocl test artifact (base already includes _lib/_dev).
if args.tests:
argv.append("core-ocl_test")
if args.ocltst:
# ocltst binaries ship in the core-ocl test artifact (base already includes _lib/_dev).
extra_artifacts.append("core-ocl")

Comment on lines +23 to +65
# copies the dlls to local ocltst path.
# to overwrite the registry entries
def copy_dlls_exe_path(ocltst_path):
if platform.system() == "Windows":
# hip and comgr dlls need to be copied to the same folder as exectuable
dlls_pattern = ["amdocl*.dll", "amd_comgr*.dll", "OpenCL.dll"]
dlls_to_copy = []
for pattern in dlls_pattern:
dlls_to_copy.extend(THEROCK_BIN_DIR.glob(pattern))
for dll in dlls_to_copy:
try:
shutil.copy(dll, ocltst_path)
logging.info(f"++ Copied: {dll} to {ocltst_path}")
except Exception as e:
logging.info(f"++ Error copying {dll}: {e}")


# returns ocltst path
def setup_env(env):
ROCM_PATH = Path(THEROCK_DIR)
env["ROCM_PATH"] = str(ROCM_PATH)
if not is_windows:
ROCK_LIB_PATH = Path(THEROCK_DIR) / "lib"
OCL_LIB = Path(ROCK_LIB_PATH) / "opencl"
LLVM_LIB = Path(ROCK_LIB_PATH) / "llvm" / "lib"
ROCM_SYSDEPS_LIB = Path(ROCK_LIB_PATH) / "rocm_sysdeps" / "lib"
OCL_ICD_VENDORS = Path(THEROCK_DIR) / "etc" / "OpenCL" / "vendors"
OCLTST_PATH = Path(THEROCK_DIR) / "share" / "opencl" / "ocltst"
LD_LIBRARY_PATH = os.getenv("LD_LIBRARY_PATH")
if LD_LIBRARY_PATH is not None:
LD_LIBRARY_PATH = Path(LD_LIBRARY_PATH)
env["LD_LIBRARY_PATH"] = (
f"{ROCK_LIB_PATH}:{OCL_LIB}:{LLVM_LIB}:{ROCM_SYSDEPS_LIB}:{LD_LIBRARY_PATH}:{OCLTST_PATH}"
)
env["OCL_ICD_VENDORS"] = f"{OCL_ICD_VENDORS}/"
else:

OCLTST_PATH = Path(THEROCK_DIR) / "tests" / "ocltst"
copy_dlls_exe_path(OCLTST_PATH)
OCL_DLL_FILE = Path(OCLTST_PATH) / "amdocl64.dll"
OCL_ICD_DLL = Path(THEROCK_BIN_DIR) / "OpenCL.dll"
env["OCL_ICD_FILENAMES"] = str(OCL_DLL_FILE)
return OCLTST_PATH
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can provide another LIMITED, TEMPORARY exception for these sorts of DLL copying workarounds in tests as part of adding ocltst coverage to TheRock but this needs to be fixed urgently as it impacts core developer workflows and shipping package structures.

We're still recovering from issues where framework (JAX) automated tests, QA processes, and packages were completely misaligned to the point where we were silently falling back to testing on CPU on CI, QA had their own undocumented test procedures, and the end result were user-facing packages that were fundamentally broken.

This has all the same smells to it. We ship these test files as part of releases. Users/developers/etc. can and will try to run the test files in those release archives. They will not work as-is. Users/developers/etc. will not think to find a script like this or copy DLLs / set environment variables themselves.

Tracking follow-ups in #4192.

@agunashe agunashe marked this pull request as draft April 1, 2026 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gfx1151 Issue/PR relates to gfx1151. test:ocltst For pull requests, this label executes the ocltst and other labeled projects.

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

5 participants