Skip to content

Commit a6c3f6d

Browse files
committed
Document CUDA graph capture mode
* Document CUDA graph capture mode Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2405
1 parent e513be1 commit a6c3f6d

1 file changed

Lines changed: 14 additions & 0 deletions

File tree

docs/user_guide/runtime.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1350,6 +1350,20 @@ ensure that :func:`wp.capture_end <warp.capture_end>` is called regardless of ex
13501350
13511351
wp.capture_launch(capture.graph)
13521352
1353+
CUDA graph capture also accepts a ``capture_mode`` argument, which controls how strictly CUDA rejects capture-unsafe
1354+
runtime API calls while capture is active. Warp defaults to ``wp.CaptureMode.THREAD_LOCAL``, matching its historical
1355+
behavior. When composing with libraries that may perform lazy CUDA runtime calls during capture, such as context or
1356+
allocator initialization, use ``wp.CaptureMode.RELAXED``:
1357+
1358+
.. code:: python
1359+
1360+
with wp.ScopedCapture(device="cuda", capture_mode=wp.CaptureMode.RELAXED) as capture:
1361+
# record launches
1362+
for i in range(100):
1363+
wp.launch(kernel=compute1, inputs=[a, b], device="cuda")
1364+
1365+
The ``capture_mode`` argument applies only to CUDA graph capture and is ignored for CPU graph recording.
1366+
13531367
Note that only launch calls are recorded in the graph; any Python executed outside of the kernel code will not be recorded.
13541368
Typically it is only beneficial to use CUDA graphs when the graph will be reused or launched multiple times, as
13551369
there is a graph-creation overhead.

0 commit comments

Comments
 (0)