cudnn frontend v1.13.0 #150

Anerudhan · 2025-07-17T01:48:16Z

cudnn frontend v1.13 release notes

cudnn frontend v1.13 is the preferred cudnn frontend version for cudnn version 9.11.0 and above.

New API

Introduces device descriptor, which allows for device-less compilation of cudnn graph on a target GPU. See newly added sample and documentation.

Improvements

SDPA

Introduced generate_stats as an alias to is_inference. generate_stats will be used to control the stat tensor dump. is_inference is now deprecated usage.
Improved support checks for left and right diagonal bands in conjunction with the diagonal alignment.
Improved error handling for large head dimension (d > 128) in sdpa bprop.

Normalizations

Added support for fused Layernorm with Relu and samples for Layernorm with relu bitmask dump

Others

Published improved SDPA training benchmarks for fp8 and fp16/bf16 graph patterns.
Enable int4 Weight only Quantization for matmul. See example
Allow block scale dequantize (required for low precision matmul) to take 2-D scale factor.
Allow reductions to accept deterministic as a attribute.
Added pybinds for block scale dequantize.

Bug Fixes

Fixed the sliding window attn_score_modifier function allowing it to set true negative infinity.

cudnn frontend v1.13 is the preferred cudnn frontend version for [cudnn version 9.11.0](https://docs.nvidia.com/deeplearning/cudnn/backend/latest/release-notes.html#cudnn-9-11-0) and above. Introduces device descriptor, which allows for device-less compilation of cudnn graph on a target GPU. See newly added [sample](samples/cpp/misc/deviceless_aot_compilation.cpp) and documentation. - Introduced `generate_stats` as an alias to `is_inference`. `generate_stats` will be used to control the stat tensor dump. `is_inference` is now deprecated usage. - Improved support checks for left and right diagonal bands in conjunction with the diagonal alignment. - Improved error handling for large head dimension (d > 128) in sdpa bprop. - Added support for fused Layernorm with Relu and samples for [Layernorm with relu bitmask dump](samples/cpp/norm/layernorm_bitmask_relu.cpp) - Published improved SDPA training benchmarks for fp8 and fp16/bf16 graph patterns. - Enable int4 Weight only Quantization for matmul. See [example](samples/cpp/int4_woq_matmul.cpp) - Allow block scale dequantize (required for low precision matmul) to take 2-D scale factor. - Allow reductions to accept deterministic as a attribute. - Added pybinds for block scale dequantize. - Fixed the sliding window attn_score_modifier function allowing it to set true negative infinity.

Anerudhan merged commit 9793df5 into main Jul 17, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cudnn frontend v1.13.0 #150

cudnn frontend v1.13.0 #150

Uh oh!

Anerudhan commented Jul 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cudnn frontend v1.13.0 #150

cudnn frontend v1.13.0 #150

Uh oh!

Conversation

Anerudhan commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

cudnn frontend v1.13 release notes

New API

Improvements

SDPA

Normalizations

Others

Bug Fixes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Anerudhan commented Jul 17, 2025 •

edited

Loading