You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Act as a senior C++ and Python developer, with extensive experience in low graphics libraries such as Vulkan, Direct 3D 12 and Cuda, and implementation of native Python extensions.
2
+
3
+
Project overview:
4
+
- This project is a native Python extension that provides a high-level interface for working with low-level graphics APIs.
5
+
- The majority of the native side provides a more python friendly wrapper around the slang-rhi project (in #external/slang-rhi)
6
+
- The python extension exposes this code using nanobind bindings
7
+
- The project also contains a predominantly Python system that allows the user to 'call' a slang function on the gpu with Python function call syntax
8
+
9
+
Directory structure:
10
+
- The majority of the native c++ code is in #src/sgl
11
+
- The python bindings are in the directory #src/slangpy_ext
12
+
- The python code is in #slangpy
13
+
- The python tests are in #slangpy/tests
14
+
- The c++ tests are in #tests
15
+
16
+
Code structure:
17
+
- Any new python api must have tests added in #slangpy/tests.
18
+
- The project is mainly divided into the pure native code (sgl), the python extension (slangpy_ext) and the python code (slangpy).
19
+
- The C++ code is responsible for the low-level graphics API interactions, and most types directly map to a slang-rhi counterpart. i.e. Device wraps the slang-rhi rhi::IDevice type
20
+
21
+
Building:
22
+
- To build the project, run "cmake --build ./build --config Debug"
23
+
24
+
Testing:
25
+
- Python tests are in #slangpy/tests and C++ tests are in #tests
26
+
- The Python testing system uses pytest
27
+
- The C++ testing system uses doctest
28
+
- Always build before running tests.
29
+
- To run all Python tests, run "pytest slangpy/tests"
30
+
31
+
C++ Code style:
32
+
- Class names should start with a capital letter.
33
+
- Function names are in snake_case.
34
+
- Local variable names are in snake_case.
35
+
- Member variables start with "m_" and are in snake_case.
36
+
37
+
Python code style:
38
+
- Class names should start with a capital letter.
39
+
- Function names are in snake_case.
40
+
- Local variable names are in snake_case.
41
+
- Member variables start with "m_" and are in snake_case.
42
+
- All arguments should have type annotations.
43
+
44
+
Additional tools:
45
+
- Once a task is complete, to fix any formatting errors, run "pre-commit run --all-files".
46
+
- If changes are made, pre-commit will modify the files in place and return an error. Re-running the command should then succeed.
47
+
48
+
External dependencies:
49
+
- The code has minimal external dependencies, and we should avoid adding new ones.
50
+
- pytest is used for python testing
51
+
- doctest is used for c++ testing
52
+
- existing python libraries for the runtime are in #requirements.txt
53
+
- python libraries for development (eg tests) are in #requirements-dev.txt
54
+
- the slang shading language is used for writing shaders
Here, rather than simply write ``spy.Module.load_from_file``, we write ``spy.TorchModule.load_from_file``. From here, all structures or functions utilizing the module will support PyTorch tensors and be injected into PyTorch's auto-grad graph.
16
+
# Create a device configured for PyTorch integration
17
+
# CUDA backend is recommended for best performance
In future SlangPy versions we intend to remove the need for wrapping altogether, instead auto-detecting the need for auto-grad support at the point of call.
SlangPy automatically detects when PyTorch tensors are used and integrates them into PyTorch's auto-grad graph. No special module types are needed - you can use the standard ``spy.Module`` type as documented in `First Functions <../basics/firstfunctions.html>`_.
19
24
20
25
Creating a tensor
21
26
-----------------
22
27
23
-
Now, rather than use a SlangPy ``Tensor``, we create a ``torch.Tensor`` tensor to store the inputs:
28
+
Now, rather than use a SlangPy ``Tensor``, we create a ``torch.Tensor`` to store the inputs:
24
29
25
30
.. code-block:: python
26
31
@@ -30,16 +35,16 @@ Now, rather than use a SlangPy ``Tensor``, we create a ``torch.Tensor`` tensor t
30
35
Note:
31
36
32
37
- We set ``requires_grad=True`` to tell PyTorch to track the gradients of this tensor.
33
-
- We set ``device='cuda'`` to ensure the tensor is on the GPU.
38
+
- We set ``device='cuda'`` to ensure the tensor is on the GPU and matches our device configuration.
34
39
35
40
Running the kernel
36
41
------------------
37
42
38
-
Calling the function is pretty much unchanged, however calculation of gradients is now done via PyTorch:
43
+
Calling the function is unchanged from the standard SlangPy API, but calculation of gradients is now done via PyTorch:
39
44
40
45
.. code-block:: python
41
46
42
-
# Evaluate the polynomial. Result will now default to a torch tensor.
47
+
# Evaluate the polynomial. Result will automatically be a torch tensor.
43
48
# Expecting result = 2x^2 + 8x - 1
44
49
result = module.polynomial(a=2, b=8, c=-1, x=x)
45
50
print(result)
@@ -49,20 +54,54 @@ Calling the function is pretty much unchanged, however calculation of gradients
49
54
result.backward(torch.ones_like(result))
50
55
print(x.grad)
51
56
52
-
This works because the wrapped PyTorch module automatically wrapped the call to `polynomial` in a custom autograd function. As a result, the call to `result.backwards` automatically called `module.polynomial.bwds`.
57
+
This works because SlangPy automatically detects PyTorch tensors and wraps the call to `polynomial` in a custom autograd function. As a result, the call to `result.backward` automatically invokes `module.polynomial.bwds` to compute gradients.
58
+
59
+
Device Backend Selection
60
+
------------------------
61
+
62
+
SlangPy supports multiple backend types for PyTorch integration:
63
+
64
+
**CUDA Backend (Recommended)**
65
+
66
+
The CUDA backend provides the best performance by directly sharing the CUDA context with PyTorch:
These backends use CUDA interop with shared memory and semaphores to synchronize between SlangPy and PyTorch. While functional, this approach has higher overhead due to hardware context switching and memory copies.
53
87
54
88
A word on performance
55
89
---------------------
56
90
57
-
This example showed a very basic use of PyTorch's auto-grad capabilities. However in practice, the switch from a CUDA PyTorch context to a D3D or Vulkan context has an overhead. Typically, very simple logic will be faster in PyTorch. However as functions become more complex, writing them as simple scalar processes that are vectorized by SlangPy and wrapped in PyTorch quickly becomes apparent.
91
+
The choice of backend significantly impacts performance:
58
92
59
-
Additionally, we intend to add a pure CUDA backend to SlangPy in the future, which will allow for seamless switching between PyTorch and SlangPy contexts.
93
+
- **CUDA Backend**: Provides the best performance for compute-focused workloads. Very simple operations may still be faster in pure PyTorch, but as functions become more complex, the benefits of SlangPy's vectorization and GPU optimization become apparent.
94
+
95
+
- **Graphics Backends (D3D12/Vulkan)**: Useful when graphics features are required, but expect substantially worse performance due to context switching overhead. Consider whether the graphics features are truly necessary for your use case.
60
96
61
97
Summary
62
98
-------
63
99
64
-
That's it! You can now use PyTorch tensors with SlangPy, and take advantage of PyTorch's auto-grad capabilities. This example covered:
100
+
PyTorch integration with SlangPy is seamless and automatic. This example covered:
101
+
102
+
- Device creation using `create_torch_device` with support for CUDA, D3D12, and Vulkan backends
103
+
- Automatic detection of PyTorch tensors - no special module types required
104
+
- Use of PyTorch's `.backward()` process to track an auto-grad graph and backpropagate gradients
105
+
- Performance considerations when choosing between CUDA and graphics backends
65
106
66
-
- Initialization with a `TorchModule` to enable PyTorch support
67
-
- Use of PyTorch's `.backward` process to track an auto-grad graph and back propagate gradients.
68
-
- Performance considerations when wrapping Slang code with PyTorch.
107
+
The CUDA backend is recommended for best performance, while graphics backends provide access to additional GPU features at the cost of some performance overhead.
0 commit comments