Skip to content

Commit 91b7532

Browse files
authored
# cudnn frontend v1.10 release notes (#126)
cudnn frontend v1.10 is the preferred cudnn frontend to be used for cudnn backend 9.7.0 and later as it adds to the Blackwell specific features. ## New API - cudnn Frontend v1.10 introduces two new operators, block_scale_quantize and block_scale_dequantize to specify the scaling and de-scaling of low precision datatypes supported from Blackwell GPU onwards. - `create_execution_plan(int64_t const engine_id, std::unordered_map<KnobType_t, int64_t> const &knobs)` allows creation of a custom execution plan with hardcoded engine and knobs. Added a sample in `samples/cpp/misc/custom_plan.cpp` to showcase how to work with different `Engine` and `Knobs`. ## Improvements - Users can now query behavior notes of a particular execution plan using `get_behavior_notes(std::vector<BehaviorNote_t> &notes) const` and `get_behavior_notes_for_plan_at_index(int64_t const index, std::vector<BehaviorNote_t> &notes) const` functions. - SDPA operations now accept both left window and right window size with respect to diagonal. See Attention.md for more details. - SDPA operations now accept a diagonal alignment for the Attention score matrix to be used describe the above window. When `s_q != s_kv`, and causal mask is on this can be used to specify if the diagonal is top left or bottom right. - Bottom right causal masking can now be enabled on the sdpa_fp8 operation. ## Bug fixes - Fixed a regression in cuDNN FrontEnd v1.9.0 where the softmax node would override user-set dims and strides for softmax_stats and m_zinv. This also affected sdpa_forward and sdpa_fp8_forward node ## New samples - Added an example to showcase how native cuda graphs can be constructed from the SDPA operation graph.
1 parent ee971b1 commit 91b7532

File tree

112 files changed

+5276
-1444
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

112 files changed

+5276
-1444
lines changed

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
cmake_minimum_required(VERSION 3.17)
22

3-
project(cudnn_frontend VERSION 1.9.0)
3+
project(cudnn_frontend VERSION 1.10.0)
44

55
option(CUDNN_FRONTEND_SKIP_JSON_LIB "Defines whether FE should not include nlohmann/json.hpp." OFF)
66
option(CUDNN_FRONTEND_BUILD_SAMPLES "Defines if samples are built or not." ON)

README.FE.1.0.md

Lines changed: 24 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,30 @@ This method guarantees that executing the graph using plans queried will succeed
121121
cudnn_frontend::error_t cudnn_frontend::graph::Graph::check_support(cudnnHandle_t h);
122122
```
123123

124+
### Querying Plan Properties (Optional)
125+
You can query the properties of the plan at a given index, or of the candidate plan using the following methods:
126+
127+
```
128+
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::get_behavior_notes_for_plan_at_index(int64_t const plan_index, std::vector<cudnn_frontend::BehaviorNote_t> &);
129+
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::get_behavior_notes(std::vector<cudnn_frontend::BehaviorNote_t> &);
130+
```
131+
132+
The `notes` argument acts as the out parameter.
133+
134+
### Filtering Plans (Optional)
135+
Users can filter plans on numerical, behavioral notes, or plans that do not provide desired functional correctness.
136+
137+
```
138+
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::select_numeric_notes(std::vector<cudnn_frontend::NumericalNote_t> const&);
139+
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::select_behavior_notes(std::vector<cudnn_frontend::BehaviorNote_t> const&);
140+
141+
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_numeric_notes(std::vector<cudnn_frontend::NumericalNote_t> const&);
142+
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_behavior_notes(std::vector<cudnn_frontend::BehaviorNote_t> const&);
143+
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_workspace_greater_than(int64_t const workspace);
144+
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_shared_mem_greater_than(int64_t const shared_memory);
145+
```
146+
147+
124148
### Building the Execution Plan
125149

126150
This function builds execution plans queried with `create_execution_plan(...)` API.
@@ -146,18 +170,6 @@ cudnn_frontend::Graph::build_plan_at_index(
146170
int64_t plan_index
147171
);
148172
```
149-
### Filtering Plans (Optional)
150-
Users can filter plans on numerical, behavioral notes, or plans that do not provide desired functional correctness.
151-
152-
```
153-
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::select_numeric_notes(std::vector<cudnn_frontend::NumericalNote_t> const&);
154-
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::select_behavior_notes(std::vector<cudnn_frontend::BehaviorNote_t> const&);
155-
156-
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_numeric_notes(std::vector<cudnn_frontend::NumericalNote_t> const&);
157-
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_behavior_notes(std::vector<cudnn_frontend::BehaviorNote_t> const&);
158-
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_workspace_greater_than(int64_t const workspace);
159-
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_shared_mem_greater_than(int64_t const shared_memory);
160-
```
161173

162174
### Autotuning
163175

benchmark/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM nvcr.io/nvidia/pytorch:24.07-py3
1+
FROM nvcr.io/nvidia/pytorch:24.12-py3
22

33
RUN apt-get update && \
44
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \

benchmark/benchmark_flash_attention.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -565,6 +565,7 @@ def time_fwd(func, *args, **kwargs):
565565
is_inference=is_infer,
566566
attn_scale=attn_scale,
567567
use_causal_mask=is_causal,
568+
use_padding_mask=False,
568569
)
569570

570571
o_fwd.set_output(True).set_dim(o_gpu.size()).set_stride(
Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,31 @@
1+
# CUDA Graphs
12

2-
3-
### `populate_cuda_graph`
3+
## `populate_cuda_graph`
44

55
The `populate_cuda_graph` function is a member function of the `Graph` class. It is used to populate a CUDA graph with the necessary data and operations.
66

7-
#### Parameters
7+
### Parameters
88

99
- `handle`: A cuDNN handle.
1010
- `uid_to_device_ptrs`: A map of tensor UIDs to device pointers.
1111
- `workspace`: A pointer to the workspace memory.
1212
- `cudnn_cuda_graph`: A pointer to the CUDA graph.
1313

14-
#### Return Value
14+
### Return Value
1515

1616
- An `error_t` object indicating the success or failure of the function.
1717

18-
### `update_cuda_graph`
18+
## `update_cuda_graph`
1919

2020
The `update_cuda_graph` function is a member function of the `Graph` class. It is used to update a CUDA graph with the necessary data and operations.
2121

22-
#### Parameters
22+
### Parameters
2323

2424
- `handle`: A cuDNN handle.
2525
- `uid_to_device_ptrs`: A map of tensor UIDs to device pointers.
2626
- `workspace`: A pointer to the workspace memory.
2727
- `cudnn_cuda_graph`: A pointer to the CUDA graph.
2828

29-
#### Return Value
29+
### Return Value
3030

3131
- An `error_t` object indicating the success or failure of the function.

docs/custom-execution-plan.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
Here is an example of creating a custom execution plan with hardcoded engine and knobs. Please see coresponding C++ sample in `samples/cpp/misc/custom_plan.cpp`.
2+
3+
### Get engine count
4+
```
5+
inline error_t
6+
get_engine_count(int64_t &count);
7+
```
8+
#### Parameters
9+
10+
- `count`: number of engines [out parameter]
11+
12+
#### Return Value
13+
- An `error_t` object indicating the success or failure of the function.
14+
15+
### Get knobs supported by an engine
16+
```
17+
inline error_t
18+
get_knobs_for_engine(int64_t const engine, std::vector<Knob> &);
19+
```
20+
#### Parameters
21+
22+
- `engine`: engine index
23+
- `knobs`: list of knobs [out parameter]
24+
25+
#### Return Value
26+
- An `error_t` object indicating the success or failure of the function.
27+
28+
### Create a plan with particular engine and knobs
29+
```
30+
error_t
31+
create_execution_plan(int64_t const engine_id, std::unordered_map<KnobType_t, int64_t> const &knobs);
32+
```
33+
#### Parameters
34+
35+
- `engine_id`: engine index
36+
- `knobs`: knobs
37+
38+
#### Return Value
39+
- An `error_t` object indicating the success or failure of the function.

docs/dynamic_kernel_cache.md renamed to docs/dynamic-kernel-cache.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,15 @@
1-
## Table of Contents
2-
1. [Dynamic Shapes APIs](#Dynamic-Shapes)
3-
2. [Kernel Cache APIs](#Kernel-Cache)
1+
# Dynamic Shapes and Kernel Cache
2+
3+
## Dynamic Shapes
44

5-
### Dynamic Shapes
65
Causes other APIs (such as the kernel cache) to treat the graph as a dynamic shape graph.
76

87
The API to achieve the above is:
98
```cpp
109
graph.set_dynamic_shape_enabled(true)
1110
```
1211

13-
### Kernel Cache
12+
## Kernel Cache
1413
The kernel cache significantly reduces plan build time by re-using a previously compiled kernel for a given execution plan. Kernel caching is enabled only for dynamic shape graphs.
1514

1615
If a graph's kernel cache attribute is set, the kernel cache will store the kernel which was compiled for the graph's execution plan.
@@ -25,4 +24,3 @@ The API to set a dynamic shape graph's kernel cache is:
2524
```cpp
2625
graph.set_kernel_cache(kernel_cache)
2726
```
28-

0 commit comments

Comments
 (0)