forked from intel/llvm
-
Notifications
You must be signed in to change notification settings - Fork 3
[SYCL][Graph] Support for native-command #383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
9836c3c
to
5a5aef2
Compare
EwanC
commented
Feb 3, 2025
764aebc
to
4e62db9
Compare
reble
reviewed
Feb 6, 2025
sycl/doc/extensions/experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc
Outdated
Show resolved
Hide resolved
c07c67a
to
f499268
Compare
b934d72
to
c085dc2
Compare
a710e4c
to
d9628c0
Compare
01d273a
to
c29c1c4
Compare
Adds limited support for device globals in runtime-compiled SYCL code. The application interacts with the globals via three new methods on `kernel_bundle`: ```c++ bool ext_oneapi_has_device_global(const std::string &name); void *ext_oneapi_get_device_global_address(const std::string &name, const device &dev); // return a USM pointer suitable for queue::memcpy etc. size_t ext_oneapi_get_device_global_size(const std::string &name); ``` This PR uses the same trick as intel#16316, i.e. prepending a kernel-bundle-specific prefix to the names of device globals to make them distinguishable for the program manager. Limitations: - Device globals inside a namespace are unsupported due to insufficient name mangling. - Device globals with the `device_image_scope` property cannot be read/written from the host, because the runtime currently cannot expose USM pointers for them. A workaround is using explicit kernels to read/write the global's value into a USM buffer. --------- Signed-off-by: Julian Oppermann <[email protected]>
c29c1c4
to
ac21cad
Compare
This is one patch in a series intended to re-organize `add_sycl_rt_library` helper function to make it shorter and simpler. Dropped unnecessary `target_include_directories` calls: - `LIB_NAME` is a link-only target, no need to set any include directories for it - SYCL RT should not be using `unified-runtime/source` or `unified-runtime/source/common` includes
intel#17446) The description of the `build` shorthand uses `ctxt`, despite there being no such argument associated. This commit fixes this by using the devices from the `sourceBundle` argument. Signed-off-by: Larsen, Steffen <[email protected]>
ac21cad
to
3661bfe
Compare
Also remove a cfi exclusive KNOWN_FAILURE as it couldn't be reproduced across a few platforms. closes oneapi-src/unified-runtime#2647
…l#17430) Cherry-pick of: llvm/llvm-project#128513 Fixes: intel#16903 ------ This diagnostic is disabled for device compilation as float128 is not supported on the device side. Other diagnostics are already covering the cases where float128 is actually used in the kernel code, and it's already tested for in the existing test. This is expanding on the patch 318bff6 that handled this for cuda compilation.
Latest distros don't allow global pip installations.
3661bfe
to
41687a3
Compare
Consolidated all common (i.e. non-conditional) include directories settings for SYCL RT build in a single place.
…tel#17297) Changes: Command_list_manager no longer synchronize its calls, instead the responsibility to ensure exclusivity belongs to the caller. To add synchronization I implemented the mechanism similar to rust lock as suggested in intel#17061 (comment).
Move cts_exclude_filter from `devops` dir to `sycl` and check it out from testing branch.
The SYCL spec mandates that kernel IDs for kernels generated through RTC shouldn't be accessible from the user application. This PR makes it so we filter out the output of `get_kernel_ids` so that we behave according to the spec. Removes some tests that were relying on kernel_ids being accessible from the user application, updates `test_lifetime` to check on debug prints. Refactors `kernel_compiler_sycl_jit_lt.cpp` to remove unnecessary device capability checks and use a common `sycl::queue`.
Add support for timeline semaphores on linux and windows for CUDA backend and preliminary support in L0 backend. Update vulkan_interop/sampled_images.cpp to remove redundant wait when using semaphores.
…ntel#17366) Caller might set CMAKE_MSVC_RUNTIME_LIBRARY to MultiThreaded to do static linking of MSVC runtime. However, UMF is not yet ready for static linking. Override the CMAKE_MSVC_RUNTIME_LIBRARY to fix lnking failures.
Bump UMF to v0.11.0-dev4. From now on disjoint_pool is part of libumf, instead of being a separate library. --------- Signed-off-by: Lukasz Dorau <[email protected]>
`urCommandBufferEnqueueExp` was changed in intel#16984 to `urEnqueueCommandBufferExp` but added back accidentally to the v2 L0 adapter in intel#17297
Support [sycl_ext_codeplay_enqueue_native_command](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc) with SYCL-Graph. Introduces `interop_handle::ext_codeplay_get_native_graph<backend>()` to give the user access to the native graph object which native commands can be appended to. To use CUDA as an example, code using `ext_codeplay_enqueue_native_command` eagerly can be updated from: ```cpp CGH.ext_codeplay_enqueue_native_command([=](interop_handle IH) { auto NativeStream = IH.get_native_queue<cuda>(); myNativeLibraryCall(NativeStream); } ``` To ```cpp CGH.ext_codeplay_enqueue_native_command([=](interop_handle IH) { if (IH.ext_codeplay_has_graph()) { auto NativeGraph = IH.ext_codeplay_get_native_graph<cuda>(); auto NativeStream = IH.get_native_queue<cuda>(); // Start capture stream calls into graph cuStreamBeginCaptureToGraph(NativeStream, NativeGraph, nullptr, nullptr, 0, CU_STREAM_CAPTURE_MODE_GLOBAL); myNativeLibraryCall(NativeStream); // Stop capturing stream calls into graph cuStreamEndCapture(NativeStream, &NativeGraph); } else { auto NativeStream = IH.get_native_queue<cuda>(); myNativeLibraryCall(NativeStream ); } } ``` Example of how this integration could work in GROMACS https://gitlab.com/gromacs/gromacs/-/merge_requests/4954
41687a3
to
d82bb30
Compare
Closing this, upstream PR has now merged intel#16871 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support sycl_ext_codeplay_enqueue_native_command with SYCL-Graph. Requires oneapi-src/unified-runtime#2656
Introduces
interop_handle::ext_oneapi_get_native_graph<backend>()
to give the user access to the native graph object which native commands can be appended to.To use CUDA as an example, code using
ext_codeplay_enqueue_native_command
eagerly can be updated from:To
Example of how this integration could work in GROMACS https://gitlab.com/gromacs/gromacs/-/merge_requests/4954