Release v1.12.0 release · NVIDIA/cudnn-frontend

cudnn frontend v1.12 release notes

cudnn frontend v1.12 is the preferred cudnn frontend version for cudnn version 9.9.0 and above.

cudnn_frontend v1.12 is the minimum cudnn frontend version required to work with cuda 13.0 and above

Update the dlpack version and cmake minimum required version to be 3.18

New API

Allows compilation and loading of cudnn frontend with cudnn-jit packages.
Introduce Adaptive Layernorm (fprop and bprop) operation in cudnn.

std::array<std::shared_ptr<Tensor_attributes>, 3>
adalayernorm(std::shared_ptr<Tensor_attributes>& input,
                         std::shared_ptr<Tensor_attributes>&  scale,
                         std::shared_ptr<Tensor_attributes>&  bias,                                                                                                                                                                                   
                         AdaLayernorm_attributes attributes);

std::array<std::shared_ptr<Tensor_attributes>, 3> adalayernorm_backward(
                         std::shared_ptr<Tensor_attributes>  dy,
                         std::shared_ptr<Tensor_attributes>   x,
                         std::shared_ptr<Tensor_attributes>  scale,                                                                                                                                                    
                         AdaLayernorm_backward_attributes   options);

Please refer to samples for usage.

cudnn frontend python API introduces two decorator function cudnn.jit and cudnn.graph for simpler graph creation in python. Refer the matmul sample for usage.

Improvements

SDPA

Allows large embedded dimension (d > 128) for fprop across Ampere, Hopper, and Blackwell architectures for bf16/fp16.
Added better validation checks for sliding window attention for cudnn version 9.9.0 and below.
Sliding windown attention now supports cases when s_q > s_kv
sdpa_fp8 operation now pads correctly with negative infinity on masking operation rather than high negative value. This improves the numerical stability of the sdpa operation with fp8 data type.
Paged attention now supports page tables in a packed format

Normalizations

Allow zero-centered scale in layer norm. Refer to this sample for usage.

Others

cudnn frontend now supports serialization of dynamic kernel cache.

Bug Fixes

Fixed the dlopen of cudart.so to look for the binary with version name.
Correctly fail when SDPA bprop is called on Blackwell with embedded dimension (d) > 128.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.12.0 release

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

cudnn frontend v1.12 release notes

New API

Improvements

SDPA

Normalizations

Others

Bug Fixes

Uh oh!