Skip to content

Commit a2353e5

Browse files
committed
[SYCL][Graph] Support for native-command
Support [sycl_ext_codeplay_enqueue_native_command](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc) with SYCL-Graph. Introduces `interop_handle::ext_codeplay_get_native_graph<backend>()` to give the user access to the native graph object which native commands can be appended to. To use CUDA as an example, code using `ext_codeplay_enqueue_native_command` eagerly can be updated from: ```cpp CGH.ext_codeplay_enqueue_native_command([=](interop_handle IH) { auto NativeStream = IH.get_native_queue<cuda>(); myNativeLibraryCall(NativeStream); } ``` To ```cpp CGH.ext_codeplay_enqueue_native_command([=](interop_handle IH) { if (IH.ext_codeplay_has_graph()) { auto NativeGraph = IH.ext_codeplay_get_native_graph<cuda>(); auto NativeStream = IH.get_native_queue<cuda>(); // Start capture stream calls into graph cuStreamBeginCaptureToGraph(NativeStream, NativeGraph, nullptr, nullptr, 0, CU_STREAM_CAPTURE_MODE_GLOBAL); myNativeLibraryCall(NativeStream); // Stop capturing stream calls into graph cuStreamEndCapture(NativeStream, &NativeGraph); } else { auto NativeStream = IH.get_native_queue<cuda>(); myNativeLibraryCall(NativeStream ); } } ``` Example of how this integration could work in GROMACS https://gitlab.com/gromacs/gromacs/-/merge_requests/4954
1 parent db7eac4 commit a2353e5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+2352
-50
lines changed

sycl/doc/extensions/experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc

+115-11
Original file line numberDiff line numberDiff line change
@@ -173,13 +173,99 @@ dependencies are satisfied.
173173
The SYCL command described above completes once all of the native asynchronous
174174
tasks it contains have completed.
175175

176+
TODO - reword
176177
The call to `interopCallable` must not submit any synchronous tasks to the
177-
native backend object, and it must not block waiting for any tasks to complete.
178-
The call also must not add tasks to backend objects that underly any other
179-
queue, aside from the queue that is associated with this handler. If it does
180-
any of these things, the behavior is undefined.
178+
native backend object. The call must also not block
179+
waiting for any tasks to complete. Additionally, the call must not add tasks to
180+
backend objects that underly any other queue, aside from the queue that is
181+
associated with this handler. If it does any of these things, the behavior is undefined.
181182

182-
== Example
183+
=== SYCL-Graph Interaction
184+
185+
This section defines the interaction with the
186+
link:../experimental/sycl_ext_oneapi_graph.asciidoc[sycl_ext_oneapi_graph]
187+
extension.
188+
189+
The `interopCallable` object will be invoked during `command_graph::finalize()`
190+
when the backend object for the graph is available to give to the user as a
191+
handle. The user then may add nodes using native APIs to the backend graph
192+
object queried with `interop_handle::ext_codeplay_get_native_graph()`. The
193+
runtime will schedule the dependencies of the user added nodes such
194+
that they respect the graph node edges.
195+
196+
==== Interop Handle Class Modifications
197+
198+
```c++
199+
using graph = ext::oneapi::experimental::command_graph<
200+
ext::oneapi::experimental::graph_state::executable>;
201+
202+
class interop_handle {
203+
bool ext_codeplay_has_graph() const;
204+
205+
template <backend Backend>
206+
backend_return_t<Backend, graph> ext_codeplay_get_native_graph() const;
207+
208+
};
209+
```
210+
211+
Table {counter: tableNumber}. Native types for
212+
`template <backend Backend, class T> backend_return_t<Backend, T>` for `T` as
213+
`command_graph<graph_state::executable`.
214+
215+
[cols="2a,a"]
216+
|===
217+
|Backend|Native graph type
218+
219+
| `backend::opencl`
220+
| `cl_command_buffer_khr`
221+
222+
| `backend::ext_oneapi_level_zero`
223+
| `ze_command_list_handle_t`
224+
225+
| `backend::ext_oneapi_cuda`
226+
| `CUGraph`
227+
228+
| `backend::ext_oneapi_hip`
229+
| `hipGraph_t`
230+
231+
|===
232+
233+
==== New Interop Handle Member Functions
234+
235+
Table {counter: tableNumber}. Additional member functions of the `sycl::interop_handle` class.
236+
[cols="2a,a"]
237+
|===
238+
|Member function|Description
239+
240+
|
241+
[source,c++]
242+
----
243+
bool interop_handle::ext_codeplay_has_graph() const;
244+
----
245+
246+
| Query if the `interop_handle` object has a native graph object available.
247+
248+
|
249+
[source,c++]
250+
----
251+
template <backend Backend>
252+
backend_return_t<Backend, graph>
253+
interop_handle::ext_codeplay_get_native_graph() const;
254+
----
255+
256+
| Return the native graph object associated with the `interop_handle`.
257+
258+
Exceptions:
259+
260+
* Throws with error code `invalid` if there is no native graph object
261+
associated with the interop handle.
262+
263+
|===
264+
265+
266+
== Examples
267+
268+
=== HIP Native Task
183269

184270
This example demonstrates how to use this extension to enqueue asynchronous
185271
native tasks on the HIP backend.
@@ -207,11 +293,29 @@ q.submit([&](sycl::handler &cgh) {
207293
q.wait();
208294
```
209295

210-
== Issues
211-
212-
=== sycl_ext_oneapi_graph
296+
=== CUDA Stream Record Native Task.
213297

214-
`ext_codeplay_enqueue_native_command`
215-
cannot be used in graph nodes. A synchronous exception will be thrown with error
216-
code `invalid` if a user tries to add them to a graph.
298+
This example demonstrates how to use this extension to add stream recorded
299+
native nodes to a SYCL-Graph object on the CUDA backend.
217300

301+
```
302+
CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handle IH) {
303+
if (IH.ext_codeplay_has_graph()) {
304+
auto NativeGraph = IH.ext_codeplay_get_native_graph<cuda>();
305+
auto NativeStream = IH.get_native_queue<cuda>();
306+
307+
// Start capture stream calls into graph
308+
cuStreamBeginCaptureToGraph(NativeStream, NativeGraph, nullptr,
309+
nullptr, 0,
310+
CU_STREAM_CAPTURE_MODE_GLOBAL);
311+
312+
myNativeLibraryCall(NativeStream);
313+
314+
// Stop capturing stream calls into graph
315+
cuStreamEndCapture(NativeStream, &NativeGraph);
316+
} else {
317+
auto NativeStream = IH.get_native_queue<cuda>();
318+
myNativeLibraryCall(NativeStream );
319+
}
320+
}
321+
```

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

+2-6
Original file line numberDiff line numberDiff line change
@@ -2119,13 +2119,9 @@ extensions.
21192119

21202120
==== sycl_ext_codeplay_enqueue_native_command
21212121

2122-
`ext_codeplay_enqueue_native_command`, defined in
2122+
`ext_codeplay_enqueue_native_command` commands, defined in
21232123
link:../experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc[sycl_ext_codeplay_enqueue_native_command]
2124-
cannot be used in graph nodes. A synchronous exception will be thrown with error
2125-
code `invalid` if a user tries to add them to a graph.
2126-
2127-
Removing this restriction is something we may look at for future revisions of
2128-
`sycl_ext_oneapi_graph`.
2124+
can be used in graph nodes. See the section on SYCL-Graph interaction.
21292125

21302126
==== sycl_ext_intel_queue_index
21312127

sycl/include/sycl/detail/backend_traits_cuda.hpp

+8
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,15 @@
1717
#include <sycl/detail/backend_traits.hpp>
1818
#include <sycl/device.hpp>
1919
#include <sycl/event.hpp>
20+
#include <sycl/ext/oneapi/experimental/graph.hpp>
2021
#include <sycl/queue.hpp>
2122

2223
typedef int CUdevice;
2324
typedef struct CUctx_st *CUcontext;
2425
typedef struct CUstream_st *CUstream;
2526
typedef struct CUevent_st *CUevent;
2627
typedef struct CUmod_st *CUmodule;
28+
typedef struct CUgraph_st *CUgraph;
2729

2830
// As defined in the CUDA 10.1 header file. This requires CUDA version > 3.2
2931
#if defined(_WIN64) || defined(__LP64__)
@@ -102,6 +104,12 @@ template <> struct BackendReturn<backend::ext_oneapi_cuda, queue> {
102104
using type = CUstream;
103105
};
104106

107+
using graph = ext::oneapi::experimental::command_graph<
108+
ext::oneapi::experimental::graph_state::executable>;
109+
template <> struct BackendReturn<backend::ext_oneapi_cuda, graph> {
110+
using type = CUgraph;
111+
};
112+
105113
} // namespace detail
106114
} // namespace _V1
107115
} // namespace sycl

sycl/include/sycl/detail/backend_traits_hip.hpp

+9
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
#include <sycl/detail/backend_traits.hpp>
1818
#include <sycl/device.hpp>
1919
#include <sycl/event.hpp>
20+
#include <sycl/ext/oneapi/experimental/graph.hpp>
2021
#include <sycl/queue.hpp>
2122

2223
typedef int HIPdevice;
@@ -25,6 +26,8 @@ typedef struct ihipStream_t *HIPstream;
2526
typedef struct ihipEvent_t *HIPevent;
2627
typedef struct ihipModule_t *HIPmodule;
2728
typedef void *HIPdeviceptr;
29+
typedef struct ihipGraph *HIPGraph;
30+
typedef struct hipGraphNode *HIPGraphNode;
2831

2932
namespace sycl {
3033
inline namespace _V1 {
@@ -96,6 +99,12 @@ template <> struct BackendReturn<backend::ext_oneapi_hip, queue> {
9699
using type = HIPstream;
97100
};
98101

102+
using graph = ext::oneapi::experimental::command_graph<
103+
ext::oneapi::experimental::graph_state::executable>;
104+
template <> struct BackendReturn<backend::ext_oneapi_hip, graph> {
105+
using type = HIPGraph;
106+
};
107+
99108
template <> struct InteropFeatureSupportMap<backend::ext_oneapi_hip> {
100109
static constexpr bool MakePlatform = false;
101110
static constexpr bool MakeDevice = true;

sycl/include/sycl/detail/backend_traits_level_zero.hpp

+14-7
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,14 @@
2020
#include <sycl/device.hpp> // for device
2121
#include <sycl/event.hpp> // for event
2222
#include <sycl/ext/oneapi/backend/level_zero_ownership.hpp> // for ownership
23-
#include <sycl/image.hpp> // for image
24-
#include <sycl/kernel.hpp> // for kernel
25-
#include <sycl/kernel_bundle.hpp> // for kernel_b...
26-
#include <sycl/kernel_bundle_enums.hpp> // for bundle_s...
27-
#include <sycl/platform.hpp> // for platform
28-
#include <sycl/property_list.hpp> // for property...
29-
#include <sycl/range.hpp> // for range
23+
#include <sycl/ext/oneapi/experimental/graph.hpp>
24+
#include <sycl/image.hpp> // for image
25+
#include <sycl/kernel.hpp> // for kernel
26+
#include <sycl/kernel_bundle.hpp> // for kernel_b...
27+
#include <sycl/kernel_bundle_enums.hpp> // for bundle_s...
28+
#include <sycl/platform.hpp> // for platform
29+
#include <sycl/property_list.hpp> // for property...
30+
#include <sycl/range.hpp> // for range
3031

3132
#include <variant> // for variant
3233
#include <vector> // for vector
@@ -207,6 +208,12 @@ template <> struct BackendReturn<backend::ext_oneapi_level_zero, kernel> {
207208
using type = ze_kernel_handle_t;
208209
};
209210

211+
using graph = ext::oneapi::experimental::command_graph<
212+
ext::oneapi::experimental::graph_state::executable>;
213+
template <> struct BackendReturn<backend::ext_oneapi_level_zero, graph> {
214+
using type = ze_command_list_handle_t;
215+
};
216+
210217
template <> struct InteropFeatureSupportMap<backend::ext_oneapi_level_zero> {
211218
static constexpr bool MakePlatform = true;
212219
static constexpr bool MakeDevice = true;

sycl/include/sycl/detail/backend_traits_opencl.hpp

+10-3
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,10 @@
2121
#include <sycl/detail/ur.hpp> // for assertion and ur handles
2222
#include <sycl/device.hpp> // for device
2323
#include <sycl/event.hpp> // for event
24-
#include <sycl/kernel.hpp> // for kernel
25-
#include <sycl/kernel_bundle_enums.hpp> // for bundle_state
26-
#include <sycl/platform.hpp> // for platform
24+
#include <sycl/ext/oneapi/experimental/graph.hpp>
25+
#include <sycl/kernel.hpp> // for kernel
26+
#include <sycl/kernel_bundle_enums.hpp> // for bundle_state
27+
#include <sycl/platform.hpp> // for platform
2728

2829
#include <vector> // for vector
2930

@@ -132,6 +133,12 @@ template <> struct BackendReturn<backend::opencl, kernel> {
132133
using type = cl_kernel;
133134
};
134135

136+
using graph = ext::oneapi::experimental::command_graph<
137+
ext::oneapi::experimental::graph_state::executable>;
138+
template <> struct BackendReturn<backend::opencl, graph> {
139+
using type = cl_command_buffer_khr;
140+
};
141+
135142
template <> struct InteropFeatureSupportMap<backend::opencl> {
136143
static constexpr bool MakePlatform = true;
137144
static constexpr bool MakeDevice = true;

sycl/include/sycl/ext/oneapi/experimental/graph.hpp

+2-1
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,8 @@ enum class node_type {
114114
prefetch = 6,
115115
memadvise = 7,
116116
ext_oneapi_barrier = 8,
117-
host_task = 9
117+
host_task = 9,
118+
native_command = 10
118119
};
119120

120121
/// Class representing a node in the graph, returned by command_graph::add().

sycl/include/sycl/handler.hpp

-3
Original file line numberDiff line numberDiff line change
@@ -1934,9 +1934,6 @@ class __SYCL_EXPORT handler {
19341934
void(interop_handle)>::value>
19351935
ext_codeplay_enqueue_native_command([[maybe_unused]] FuncT &&Func) {
19361936
#ifndef __SYCL_DEVICE_ONLY__
1937-
throwIfGraphAssociated<
1938-
ext::oneapi::experimental::detail::UnsupportedGraphFeatures::
1939-
sycl_ext_codeplay_enqueue_native_command>();
19401937
ext_codeplay_enqueue_native_command_impl(Func);
19411938
#endif
19421939
}

sycl/include/sycl/interop_handle.hpp

+29-2
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
#include <sycl/detail/impl_utils.hpp> // for getSyclObjImpl
1717
#include <sycl/exception.hpp>
1818
#include <sycl/ext/oneapi/accessor_property_list.hpp> // for accessor_property_list
19+
#include <sycl/ext/oneapi/experimental/graph.hpp>
1920
#include <sycl/image.hpp> // for image
2021
#include <ur_api.h> // for ur_mem_handle_t, ur...
2122

@@ -49,6 +50,9 @@ class interop_handle {
4950
/// interop_handle.
5051
__SYCL_EXPORT backend get_backend() const noexcept;
5152

53+
/// Returns true if command-group is being added to a graph as a node
54+
__SYCL_EXPORT bool ext_codeplay_has_graph() const noexcept;
55+
5256
/// Receives a SYCL accessor that has been defined as a requirement for the
5357
/// command group, and returns the underlying OpenCL memory object that is
5458
/// used by the SYCL runtime. If the accessor passed as parameter is not part
@@ -134,6 +138,26 @@ class interop_handle {
134138
#endif
135139
}
136140

141+
using graph = ext::oneapi::experimental::command_graph<
142+
ext::oneapi::experimental::graph_state::executable>;
143+
template <backend Backend = backend::opencl>
144+
backend_return_t<Backend, graph> ext_codeplay_get_native_graph() const {
145+
#ifndef __SYCL_DEVICE_ONLY__
146+
// TODO: replace the exception thrown below with the SYCL 2020 exception
147+
// with the error code 'errc::backend_mismatch' when those new exceptions
148+
// are ready to be used.
149+
if (Backend != get_backend())
150+
throw exception(make_error_code(errc::invalid),
151+
"Incorrect backend argument was passed");
152+
153+
// C-style cast required to allow various native types
154+
return (backend_return_t<Backend, graph>)getNativeGraph();
155+
#else
156+
// we believe this won't be ever called on device side
157+
return 0;
158+
#endif
159+
}
160+
137161
/// Returns the SYCL application interoperability native backend object
138162
/// associated with the device associated with the SYCL queue that the host
139163
/// task was submitted to. The native backend object returned must be in
@@ -186,8 +210,9 @@ class interop_handle {
186210
interop_handle(std::vector<ReqToMem> MemObjs,
187211
const std::shared_ptr<detail::queue_impl> &Queue,
188212
const std::shared_ptr<detail::device_impl> &Device,
189-
const std::shared_ptr<detail::context_impl> &Context)
190-
: MQueue(Queue), MDevice(Device), MContext(Context),
213+
const std::shared_ptr<detail::context_impl> &Context,
214+
const ur_exp_command_buffer_handle_t &Graph)
215+
: MQueue(Queue), MDevice(Device), MContext(Context), MGraph(Graph),
191216
MMemObjs(std::move(MemObjs)) {}
192217

193218
template <backend Backend, typename DataT, int Dims>
@@ -211,10 +236,12 @@ class interop_handle {
211236
getNativeQueue(int32_t &NativeHandleDesc) const;
212237
__SYCL_EXPORT ur_native_handle_t getNativeDevice() const;
213238
__SYCL_EXPORT ur_native_handle_t getNativeContext() const;
239+
__SYCL_EXPORT ur_native_handle_t getNativeGraph() const;
214240

215241
std::shared_ptr<detail::queue_impl> MQueue;
216242
std::shared_ptr<detail::device_impl> MDevice;
217243
std::shared_ptr<detail::context_impl> MContext;
244+
ur_exp_command_buffer_handle_t MGraph;
218245

219246
std::vector<ReqToMem> MMemObjs;
220247
};

sycl/source/detail/graph_impl.cpp

+2
Original file line numberDiff line numberDiff line change
@@ -828,6 +828,8 @@ ur_exp_command_buffer_sync_point_t exec_graph_impl::enqueueNode(
828828
std::shared_ptr<node_impl> Node) {
829829

830830
// Queue which will be used for allocation operations for accessors.
831+
// Will also be used in native commands to return to the user in
832+
// `interop_handler::get_native_queue()` calls
831833
auto AllocaQueue = std::make_shared<sycl::detail::queue_impl>(
832834
DeviceImpl, sycl::detail::getSyclObjImpl(Ctx), sycl::async_handler{},
833835
sycl::property_list{});

sycl/source/detail/graph_impl.hpp

+5
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,8 @@ inline node_type getNodeTypeFromCG(sycl::detail::CGType CGType) {
7171
return node_type::host_task;
7272
case sycl::detail::CGType::ExecCommandBuffer:
7373
return node_type::subgraph;
74+
case sycl::detail::CGType::EnqueueNativeCommand:
75+
return node_type::native_command;
7476
default:
7577
assert(false && "Invalid Graph Node Type");
7678
return node_type::empty;
@@ -704,6 +706,9 @@ class node_impl : public std::enable_shared_from_this<node_impl> {
704706
case sycl::detail::CGType::ExecCommandBuffer:
705707
Stream << "CGExecCommandBuffer \\n";
706708
break;
709+
case sycl::detail::CGType::EnqueueNativeCommand:
710+
Stream << "CGNativeCommand \\n";
711+
break;
707712
default:
708713
Stream << "Other \\n";
709714
break;

0 commit comments

Comments
 (0)