Skip to content

Commit c4a7ea8

Browse files
authored
Torch examples (#459)
* Fix torch examples + add copilot info * Update samples repo
1 parent bc994e4 commit c4a7ea8

File tree

5 files changed

+118
-19
lines changed

5 files changed

+118
-19
lines changed

.github/copilot-instructions.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
Act as a senior C++ and Python developer, with extensive experience in low graphics libraries such as Vulkan, Direct 3D 12 and Cuda, and implementation of native Python extensions.
2+
3+
Project overview:
4+
- This project is a native Python extension that provides a high-level interface for working with low-level graphics APIs.
5+
- The majority of the native side provides a more python friendly wrapper around the slang-rhi project (in #external/slang-rhi)
6+
- The python extension exposes this code using nanobind bindings
7+
- The project also contains a predominantly Python system that allows the user to 'call' a slang function on the gpu with Python function call syntax
8+
9+
Directory structure:
10+
- The majority of the native c++ code is in #src/sgl
11+
- The python bindings are in the directory #src/slangpy_ext
12+
- The python code is in #slangpy
13+
- The python tests are in #slangpy/tests
14+
- The c++ tests are in #tests
15+
16+
Code structure:
17+
- Any new python api must have tests added in #slangpy/tests.
18+
- The project is mainly divided into the pure native code (sgl), the python extension (slangpy_ext) and the python code (slangpy).
19+
- The C++ code is responsible for the low-level graphics API interactions, and most types directly map to a slang-rhi counterpart. i.e. Device wraps the slang-rhi rhi::IDevice type
20+
21+
Building:
22+
- To build the project, run "cmake --build ./build --config Debug"
23+
24+
Testing:
25+
- Python tests are in #slangpy/tests and C++ tests are in #tests
26+
- The Python testing system uses pytest
27+
- The C++ testing system uses doctest
28+
- Always build before running tests.
29+
- To run all Python tests, run "pytest slangpy/tests"
30+
31+
C++ Code style:
32+
- Class names should start with a capital letter.
33+
- Function names are in snake_case.
34+
- Local variable names are in snake_case.
35+
- Member variables start with "m_" and are in snake_case.
36+
37+
Python code style:
38+
- Class names should start with a capital letter.
39+
- Function names are in snake_case.
40+
- Local variable names are in snake_case.
41+
- Member variables start with "m_" and are in snake_case.
42+
- All arguments should have type annotations.
43+
44+
Additional tools:
45+
- Once a task is complete, to fix any formatting errors, run "pre-commit run --all-files".
46+
- If changes are made, pre-commit will modify the files in place and return an error. Re-running the command should then succeed.
47+
48+
External dependencies:
49+
- The code has minimal external dependencies, and we should avoid adding new ones.
50+
- pytest is used for python testing
51+
- doctest is used for c++ testing
52+
- existing python libraries for the runtime are in #requirements.txt
53+
- python libraries for development (eg tests) are in #requirements-dev.txt
54+
- the slang shading language is used for writing shaders
55+
- most external c++ dependencies are in #external

docs/src/autodiff/pytorch.rst

Lines changed: 56 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,31 @@
11
PyTorch
22
=======
33

4-
Building on the previous `auto-diff <autodiff.html>`_ example, the switch to PyTorch and its auto-grad capabilities is trivial.
4+
Building on the previous `auto-diff <autodiff.html>`_ example, the switch to PyTorch and its auto-grad capabilities is straightforward.
55

66
Initialization
77
--------------
88

9-
The critical line that changes is loading the module:
9+
To use SlangPy with PyTorch, you first need to create a device configured for PyTorch integration:
1010

1111
.. code-block:: python
1212
13-
# Load torch wrapped module.
14-
module = spy.TorchModule.load_from_file(device, "example.slang")
13+
import slangpy as spy
14+
import torch
1515
16-
Here, rather than simply write ``spy.Module.load_from_file``, we write ``spy.TorchModule.load_from_file``. From here, all structures or functions utilizing the module will support PyTorch tensors and be injected into PyTorch's auto-grad graph.
16+
# Create a device configured for PyTorch integration
17+
# CUDA backend is recommended for best performance
18+
device = spy.create_torch_device(type=spy.DeviceType.cuda)
1719
18-
In future SlangPy versions we intend to remove the need for wrapping altogether, instead auto-detecting the need for auto-grad support at the point of call.
20+
# Load module using the standard Module type
21+
module = spy.Module.load_from_file(device, "example.slang")
22+
23+
SlangPy automatically detects when PyTorch tensors are used and integrates them into PyTorch's auto-grad graph. No special module types are needed - you can use the standard ``spy.Module`` type as documented in `First Functions <../basics/firstfunctions.html>`_.
1924

2025
Creating a tensor
2126
-----------------
2227

23-
Now, rather than use a SlangPy ``Tensor``, we create a ``torch.Tensor`` tensor to store the inputs:
28+
Now, rather than use a SlangPy ``Tensor``, we create a ``torch.Tensor`` to store the inputs:
2429

2530
.. code-block:: python
2631
@@ -30,16 +35,16 @@ Now, rather than use a SlangPy ``Tensor``, we create a ``torch.Tensor`` tensor t
3035
Note:
3136

3237
- We set ``requires_grad=True`` to tell PyTorch to track the gradients of this tensor.
33-
- We set ``device='cuda'`` to ensure the tensor is on the GPU.
38+
- We set ``device='cuda'`` to ensure the tensor is on the GPU and matches our device configuration.
3439

3540
Running the kernel
3641
------------------
3742

38-
Calling the function is pretty much unchanged, however calculation of gradients is now done via PyTorch:
43+
Calling the function is unchanged from the standard SlangPy API, but calculation of gradients is now done via PyTorch:
3944

4045
.. code-block:: python
4146
42-
# Evaluate the polynomial. Result will now default to a torch tensor.
47+
# Evaluate the polynomial. Result will automatically be a torch tensor.
4348
# Expecting result = 2x^2 + 8x - 1
4449
result = module.polynomial(a=2, b=8, c=-1, x=x)
4550
print(result)
@@ -49,20 +54,54 @@ Calling the function is pretty much unchanged, however calculation of gradients
4954
result.backward(torch.ones_like(result))
5055
print(x.grad)
5156
52-
This works because the wrapped PyTorch module automatically wrapped the call to `polynomial` in a custom autograd function. As a result, the call to `result.backwards` automatically called `module.polynomial.bwds`.
57+
This works because SlangPy automatically detects PyTorch tensors and wraps the call to `polynomial` in a custom autograd function. As a result, the call to `result.backward` automatically invokes `module.polynomial.bwds` to compute gradients.
58+
59+
Device Backend Selection
60+
------------------------
61+
62+
SlangPy supports multiple backend types for PyTorch integration:
63+
64+
**CUDA Backend (Recommended)**
65+
66+
The CUDA backend provides the best performance by directly sharing the CUDA context with PyTorch:
67+
68+
.. code-block:: python
69+
70+
device = spy.create_torch_device(type=spy.DeviceType.cuda)
71+
72+
This approach avoids expensive context switching and memory copies, making it ideal for performance-critical applications.
73+
74+
**Graphics Backends (D3D12, Vulkan)**
75+
76+
For applications that need access to graphics features (such as rasterization), you can use D3D12 or Vulkan backends:
77+
78+
.. code-block:: python
79+
80+
# D3D12 backend (Windows only)
81+
device = spy.create_torch_device(type=spy.DeviceType.d3d12)
82+
83+
# Vulkan backend (Cross-platform)
84+
device = spy.create_torch_device(type=spy.DeviceType.vulkan)
85+
86+
These backends use CUDA interop with shared memory and semaphores to synchronize between SlangPy and PyTorch. While functional, this approach has higher overhead due to hardware context switching and memory copies.
5387

5488
A word on performance
5589
---------------------
5690

57-
This example showed a very basic use of PyTorch's auto-grad capabilities. However in practice, the switch from a CUDA PyTorch context to a D3D or Vulkan context has an overhead. Typically, very simple logic will be faster in PyTorch. However as functions become more complex, writing them as simple scalar processes that are vectorized by SlangPy and wrapped in PyTorch quickly becomes apparent.
91+
The choice of backend significantly impacts performance:
5892

59-
Additionally, we intend to add a pure CUDA backend to SlangPy in the future, which will allow for seamless switching between PyTorch and SlangPy contexts.
93+
- **CUDA Backend**: Provides the best performance for compute-focused workloads. Very simple operations may still be faster in pure PyTorch, but as functions become more complex, the benefits of SlangPy's vectorization and GPU optimization become apparent.
94+
95+
- **Graphics Backends (D3D12/Vulkan)**: Useful when graphics features are required, but expect substantially worse performance due to context switching overhead. Consider whether the graphics features are truly necessary for your use case.
6096

6197
Summary
6298
-------
6399

64-
That's it! You can now use PyTorch tensors with SlangPy, and take advantage of PyTorch's auto-grad capabilities. This example covered:
100+
PyTorch integration with SlangPy is seamless and automatic. This example covered:
101+
102+
- Device creation using `create_torch_device` with support for CUDA, D3D12, and Vulkan backends
103+
- Automatic detection of PyTorch tensors - no special module types required
104+
- Use of PyTorch's `.backward()` process to track an auto-grad graph and backpropagate gradients
105+
- Performance considerations when choosing between CUDA and graphics backends
65106

66-
- Initialization with a `TorchModule` to enable PyTorch support
67-
- Use of PyTorch's `.backward` process to track an auto-grad graph and back propagate gradients.
68-
- Performance considerations when wrapping Slang code with PyTorch.
107+
The CUDA backend is recommended for best performance, while graphics backends provide access to additional GPU features at the cost of some performance overhead.

samples

slangpy/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
2626
# pyright: reportUnusedImport=false
2727
# isort: skip_file
28-
from .core.utils import create_device
28+
from .core.utils import create_device, create_torch_device
2929
import runpy
3030
import pathlib
3131

slangpy/core/utils.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,13 @@ def create_torch_device(
9494
# Import and init torch
9595
import torch
9696

97+
# Ensure torch cuda is initialized
9798
torch.cuda.init()
9899

100+
# These lines ensure that torch has set a default context
101+
torch.cuda.current_device()
102+
torch.cuda.current_stream()
103+
99104
# Use current device if not specified
100105
if torch_device is None:
101106
torch_device = torch.cuda.current_device()

0 commit comments

Comments
 (0)