You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cudnn frontend v1.10 is the preferred cudnn frontend to be used for
cudnn backend 9.7.0 and later as it adds to the Blackwell specific
features.
## New API
- cudnn Frontend v1.10 introduces two new operators,
block_scale_quantize and block_scale_dequantize to specify the scaling
and de-scaling of low precision datatypes supported from Blackwell GPU
onwards.
- `create_execution_plan(int64_t const engine_id,
std::unordered_map<KnobType_t, int64_t> const &knobs)` allows creation
of a custom execution plan with hardcoded engine and knobs. Added a
sample in `samples/cpp/misc/custom_plan.cpp` to showcase how to work
with different `Engine` and `Knobs`.
## Improvements
- Users can now query behavior notes of a particular execution plan
using `get_behavior_notes(std::vector<BehaviorNote_t> ¬es) const` and
`get_behavior_notes_for_plan_at_index(int64_t const index,
std::vector<BehaviorNote_t> ¬es) const` functions.
- SDPA operations now accept both left window and right window size with
respect to diagonal. See Attention.md for more details.
- SDPA operations now accept a diagonal alignment for the Attention
score matrix to be used describe the above window. When `s_q != s_kv`,
and causal mask is on this can be used to specify if the diagonal is top
left or bottom right.
- Bottom right causal masking can now be enabled on the sdpa_fp8
operation.
## Bug fixes
- Fixed a regression in cuDNN FrontEnd v1.9.0 where the softmax node
would override user-set dims and strides for softmax_stats and m_zinv.
This also affected sdpa_forward and sdpa_fp8_forward node
## New samples
- Added an example to showcase how native cuda graphs can be constructed
from the SDPA operation graph.
The `populate_cuda_graph` function is a member function of the `Graph` class. It is used to populate a CUDA graph with the necessary data and operations.
6
6
7
-
####Parameters
7
+
### Parameters
8
8
9
9
-`handle`: A cuDNN handle.
10
10
-`uid_to_device_ptrs`: A map of tensor UIDs to device pointers.
11
11
-`workspace`: A pointer to the workspace memory.
12
12
-`cudnn_cuda_graph`: A pointer to the CUDA graph.
13
13
14
-
####Return Value
14
+
### Return Value
15
15
16
16
- An `error_t` object indicating the success or failure of the function.
17
17
18
-
###`update_cuda_graph`
18
+
## `update_cuda_graph`
19
19
20
20
The `update_cuda_graph` function is a member function of the `Graph` class. It is used to update a CUDA graph with the necessary data and operations.
21
21
22
-
####Parameters
22
+
### Parameters
23
23
24
24
-`handle`: A cuDNN handle.
25
25
-`uid_to_device_ptrs`: A map of tensor UIDs to device pointers.
26
26
-`workspace`: A pointer to the workspace memory.
27
27
-`cudnn_cuda_graph`: A pointer to the CUDA graph.
28
28
29
-
####Return Value
29
+
### Return Value
30
30
31
31
- An `error_t` object indicating the success or failure of the function.
Here is an example of creating a custom execution plan with hardcoded engine and knobs. Please see coresponding C++ sample in `samples/cpp/misc/custom_plan.cpp`.
2
+
3
+
### Get engine count
4
+
```
5
+
inline error_t
6
+
get_engine_count(int64_t &count);
7
+
```
8
+
#### Parameters
9
+
10
+
-`count`: number of engines [out parameter]
11
+
12
+
#### Return Value
13
+
- An `error_t` object indicating the success or failure of the function.
Copy file name to clipboardExpand all lines: docs/dynamic-kernel-cache.md
+4-6Lines changed: 4 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,16 +1,15 @@
1
-
## Table of Contents
2
-
1.[Dynamic Shapes APIs](#Dynamic-Shapes)
3
-
2.[Kernel Cache APIs](#Kernel-Cache)
1
+
#Dynamic Shapes and Kernel Cache
2
+
3
+
## Dynamic Shapes
4
4
5
-
### Dynamic Shapes
6
5
Causes other APIs (such as the kernel cache) to treat the graph as a dynamic shape graph.
7
6
8
7
The API to achieve the above is:
9
8
```cpp
10
9
graph.set_dynamic_shape_enabled(true)
11
10
```
12
11
13
-
###Kernel Cache
12
+
## Kernel Cache
14
13
The kernel cache significantly reduces plan build time by re-using a previously compiled kernel for a given execution plan. Kernel caching is enabled only for dynamic shape graphs.
15
14
16
15
If a graph's kernel cache attribute is set, the kernel cache will store the kernel which was compiled for the graph's execution plan.
@@ -25,4 +24,3 @@ The API to set a dynamic shape graph's kernel cache is:
0 commit comments