-
Notifications
You must be signed in to change notification settings - Fork 239
Enable ocltst execution on Rock #3590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
2ff3e81
cae277b
eb1a603
c54f377
647db32
34d2e08
3baa94f
1adb9cd
472a365
909661c
d569966
2729746
cefb9a5
6c418ef
b03d94f
e23f78e
07d20c7
17305e0
4bf2085
a196243
b19ab12
5d8b7e1
606754e
aacd766
3040331
73738fb
c9ca10e
c87a036
58033ca
bfe2dc1
119c18a
8a299aa
7df37de
0935826
9d48b9c
e3125af
fcd1a9f
e36bedf
4bd3017
acdafb8
6c369e4
d8f56a2
1f09167
ebcafd8
2026025
299b076
aef601b
3913d0c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @geomin12 can you review the test script?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 30 mins is long. any options for parallelization? having a GPU machine run 30 mins for quick/sanity checks is really long and costly
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i have added 16 threads for execution. Lets see if the execution time comes down
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. okay, 16 mins is better, but let's open an ocl test ticket to introduce filtering
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 16 mins is okay now, but let's introduce an internal ticket for ocl test filtering
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #4192 - can create a internal ticket for this as well
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we add that GH issue to the code here, so we don't forget |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
| import logging | ||
| import os | ||
| import shlex | ||
| import subprocess | ||
| from pathlib import Path | ||
| import sys | ||
| import platform | ||
| import shutil | ||
|
|
||
| logging.basicConfig(level=logging.INFO) | ||
| THEROCK_BIN_DIR_STR = os.getenv("THEROCK_BIN_DIR") | ||
| if THEROCK_BIN_DIR_STR is None: | ||
| logging.info( | ||
| "++ Error: env(THEROCK_BIN_DIR) is not set. Please set it before executing tests." | ||
| ) | ||
| sys.exit(1) | ||
| THEROCK_BIN_DIR = Path(THEROCK_BIN_DIR_STR).resolve() | ||
| THEROCK_DIR = Path(THEROCK_BIN_DIR).parent | ||
| env = os.environ.copy() | ||
| is_windows = platform.system() == "Windows" | ||
|
|
||
|
|
||
| # copies the dlls to local ocltst path. | ||
| # to overwrite the registry entries | ||
| def copy_dlls_exe_path(ocltst_path): | ||
| if platform.system() == "Windows": | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: use
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would prefer to fix it in the clean up. We need this PR merged ASAP for ocl coverage.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a 1 line change, in the code this PR is directly introducing, immediately prior in the code. Fix it please. @geomin12 @amd-shiraz please don't approve PRs that so obviously are ignoring reviewer feedback. |
||
| # hip and comgr dlls need to be copied to the same folder as exectuable | ||
| dlls_pattern = ["amdocl*.dll", "amd_comgr*.dll", "OpenCL.dll"] | ||
| dlls_to_copy = [] | ||
| for pattern in dlls_pattern: | ||
| dlls_to_copy.extend(THEROCK_BIN_DIR.glob(pattern)) | ||
| for dll in dlls_to_copy: | ||
| try: | ||
| shutil.copy(dll, ocltst_path) | ||
| logging.info(f"++ Copied: {dll} to {ocltst_path}") | ||
| except Exception as e: | ||
| logging.info(f"++ Error copying {dll}: {e}") | ||
|
|
||
|
|
||
| # returns ocltst path | ||
| def setup_env(env): | ||
| ROCM_PATH = Path(THEROCK_DIR) | ||
| env["ROCM_PATH"] = str(ROCM_PATH) | ||
|
Comment on lines
+42
to
+43
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This intermediate |
||
| if not is_windows: | ||
| ROCK_LIB_PATH = Path(THEROCK_DIR) / "lib" | ||
| OCL_LIB = Path(ROCK_LIB_PATH) / "opencl" | ||
| LLVM_LIB = Path(ROCK_LIB_PATH) / "llvm" / "lib" | ||
| ROCM_SYSDEPS_LIB = Path(ROCK_LIB_PATH) / "rocm_sysdeps" / "lib" | ||
| OCL_ICD_VENDORS = Path(THEROCK_DIR) / "etc" / "OpenCL" / "vendors" | ||
| OCLTST_PATH = Path(THEROCK_DIR) / "share" / "opencl" / "ocltst" | ||
|
Comment on lines
+45
to
+50
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These variables are all already of type |
||
| LD_LIBRARY_PATH = os.getenv("LD_LIBRARY_PATH") | ||
| if LD_LIBRARY_PATH is not None: | ||
| LD_LIBRARY_PATH = Path(LD_LIBRARY_PATH) | ||
| env["LD_LIBRARY_PATH"] = ( | ||
| f"{ROCK_LIB_PATH}:{OCL_LIB}:{LLVM_LIB}:{ROCM_SYSDEPS_LIB}:{LD_LIBRARY_PATH}:{OCLTST_PATH}" | ||
| ) | ||
| env["OCL_ICD_VENDORS"] = f"{OCL_ICD_VENDORS}/" | ||
| logging.info(f"++ Setting LD_LIBRARY_PATH={env['LD_LIBRARY_PATH']}") | ||
| logging.info(f"++ Setting OCL_ICD_VENDORS={env['OCL_ICD_VENDORS']}") | ||
| else: | ||
|
|
||
| OCLTST_PATH = Path(THEROCK_DIR) / "tests" / "ocltst" | ||
| copy_dlls_exe_path(OCLTST_PATH) | ||
| OCL_DLL_FILE = Path(OCLTST_PATH) / "amdocl64.dll" | ||
| OCL_ICD_DLL = Path(THEROCK_BIN_DIR) / "OpenCL.dll" | ||
| env["OCL_ICD_FILENAMES"] = str(OCL_DLL_FILE) | ||
| logging.info(f"++ Setting OCL_ICD_FILENAMES={env['OCL_ICD_FILENAMES']}") | ||
| return OCLTST_PATH | ||
|
Comment on lines
+23
to
+68
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe we can (and should) remove nearly all of this setup code from this test script if we fix https://github.com/ROCm/rocm-systems/blob/be8076461de7f44e21a0f273d94f5872c89c1e17/projects/clr/opencl/tests/ocltst/CMakeLists.txt#L7-L11 to install ocltst into a standard install location like See for example what hipblaslt does:
Or a more recent / simpler case like dnn-providers/hipblaslt-provider: https://github.com/ROCm/rocm-libraries/blob/94453c6cb85db167e96e3f463a12d808734674b5/dnn-providers/hipblaslt-provider/cmake/Tests.cmake#L180-L192 The guiding principles are this:
See more detailed analysis of the issue and recommendations to fixing it here: https://github.com/ScottTodd/claude-rocm-workspace/blob/main/reviews/pr_TheRock_3590.md#1-install-to-bin-to-eliminate-dll-copy-and-ld_library_path-workarounds I ran local experiments with the artifacts produced by this PR here: https://github.com/ScottTodd/claude-rocm-workspace/blob/main/reviews/pr_TheRock_3590.md#local-experiment-install-to-bin-validation
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for your comment. I agree we should not have different installation paths for Linux vs windows. Looking at your experiments- To pick the right dlls to be loaded at runtime . Use Listdlls app to see the dlls loaded at runtime. opt1 did not work, because we do not have any facility in Rock to update the registry key. Updating registry key requires admin permissions so could not set it through the test_ocltst.py.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Impacts on internal builds are not generally a blocking concern for open source code, builds, or releases. The costs for living downstream are paid by downstream projects, not upstream. I see the points about timelines, but so far I'm seeing more tech debt being accumulated than paid down (I tried to run hip-tests the other day and was completely unable to do so, facing similar issues as on here... these "scripts that only work on CI" need to be fixed)
As far as I can tell, only rocgdb (also added recently) writes to a top level
Which am I missing? I have everything that this PR tries to copy and I don't see any missing DLLs when I run the various exes and dlls through https://github.com/lucasg/dependencies
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually looking at your exp setup
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree with this if its a simple bug fix, upstream gets priority. But currently we are still holding the legacy/downstream systems because of the test coverage gap in Rock. Once we have enabled the tests on Rock, we can look at fixing the issues, atleast then we have a gate check that verifies our change. Right now the opencl changes are not tested in Rock, this causes issues to be caught later in the QA cycle.
Please file an issue assign to our team, we will work on it soon. It should work outside CI as well.
ocltst is a test binary. I am not sure if its right thing to copy this to bin folder. sorry need a proper requirement and design. also exp2 should work, I am not able to repro it locally.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can provide another LIMITED, TEMPORARY exception for these sorts of DLL copying workarounds in tests as part of adding ocltst coverage to TheRock but this needs to be fixed urgently as it impacts core developer workflows and shipping package structures. We're still recovering from issues where framework (JAX) automated tests, QA processes, and packages were completely misaligned to the point where we were silently falling back to testing on CPU on CI, QA had their own undocumented test procedures, and the end result were user-facing packages that were fundamentally broken. This has all the same smells to it. We ship these test files as part of releases. Users/developers/etc. can and will try to run the test files in those release archives. They will not work as-is. Users/developers/etc. will not think to find a script like this or copy DLLs / set environment variables themselves. Tracking follow-ups in #4192. |
||
|
|
||
|
|
||
| def execute_tests(env): | ||
| OCLTST_PATH = setup_env(env) | ||
| OCLTST = Path(OCLTST_PATH) / "ocltst" | ||
|
agunashe marked this conversation as resolved.
|
||
| module = "liboclruntime.so" if not is_windows else "oclruntime.dll" | ||
| # command to execute ocltst tests | ||
| cmd = [ | ||
| f"{OCLTST}", | ||
| "-m", # module to test | ||
| f"{module}", | ||
| "-s", # threads to spawn/use | ||
| f"16", | ||
| "-A", # exclude tests in the file | ||
| "oclruntime.exclude", # perf related tests skipped | ||
| ] | ||
| logging.info(f"++ Exec [{OCLTST_PATH}]$ {shlex.join(cmd)}") | ||
| subprocess.run(cmd, cwd=OCLTST_PATH, check=True, env=env, shell=False) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| execute_tests(env) | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -38,6 +38,7 @@ | |||||||||||||||
| [--rocprofiler-sdk | --no-rocprofiler-sdk ] | ||||||||||||||||
| [--rocprofiler-systems | --no-rocprofiler-systems] | ||||||||||||||||
| [--rocrtst | --no-rocrtst] | ||||||||||||||||
| [--ocltst | --no-ocltst] | ||||||||||||||||
| [--rocwmma | --no-rocwmma] | ||||||||||||||||
| [--libhipcxx | --no-libhipcxx] | ||||||||||||||||
| [--tests | --no-tests] | ||||||||||||||||
|
|
@@ -360,6 +361,7 @@ def retrieve_artifacts_by_run_id(args): | |||||||||||||||
| args.rocprofiler_sdk, | ||||||||||||||||
| args.rocprofiler_systems, | ||||||||||||||||
| args.rocrtst, | ||||||||||||||||
| args.ocltst, | ||||||||||||||||
| args.rocwmma, | ||||||||||||||||
| args.libhipcxx, | ||||||||||||||||
| ] | ||||||||||||||||
|
|
@@ -443,6 +445,10 @@ def retrieve_artifacts_by_run_id(args): | |||||||||||||||
| # rocrtst depends on sysdeps-hwloc (which depends on sysdeps-libpciaccess) | ||||||||||||||||
| extra_artifacts.append("sysdeps-hwloc") | ||||||||||||||||
| extra_artifacts.append("sysdeps-libpciaccess") | ||||||||||||||||
| if args.ocltst: | ||||||||||||||||
| # ocltst binaries ship in the core-ocl test artifact (base already includes _lib/_dev). | ||||||||||||||||
| if args.tests: | ||||||||||||||||
| argv.append("core-ocl_test") | ||||||||||||||||
|
Comment on lines
+448
to
+451
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't quite follow existing patterns here. There is already a code block below that handles # Fetch _lib (always) and _test (when --tests) for each artifact.
# Some projects have self-contained _test archives (just test
# binaries), while others may also need executables or data from
# _run. Add those explicitly above via argv.append("<name>_run").
extra_artifact_patterns = [f"{a}_lib" for a in extra_artifacts]
if args.tests:
extra_artifact_patterns.extend([f"{a}_test" for a in extra_artifacts])This can probably just do:
Suggested change
|
||||||||||||||||
| if args.rocwmma: | ||||||||||||||||
| extra_artifacts.append("rocwmma") | ||||||||||||||||
| argv.append("rocwmma_dev") | ||||||||||||||||
|
|
@@ -781,6 +787,13 @@ def main(argv): | |||||||||||||||
| action=argparse.BooleanOptionalAction, | ||||||||||||||||
| ) | ||||||||||||||||
|
|
||||||||||||||||
| artifacts_group.add_argument( | ||||||||||||||||
| "--ocltst", | ||||||||||||||||
| default=False, | ||||||||||||||||
| help="Include ocltst from core-ocl", | ||||||||||||||||
| action=argparse.BooleanOptionalAction, | ||||||||||||||||
| ) | ||||||||||||||||
|
|
||||||||||||||||
| artifacts_group.add_argument( | ||||||||||||||||
| "--rocwmma", | ||||||||||||||||
| default=False, | ||||||||||||||||
|
|
||||||||||||||||
Uh oh!
There was an error while loading. Please reload this page.