Skip to content

v1.12.0 release

Choose a tag to compare

@Anerudhan Anerudhan released this 19 May 20:59
· 15 commits to main since this release
666996f

cudnn frontend v1.12 release notes

cudnn frontend v1.12 is the preferred cudnn frontend version for cudnn version 9.9.0 and above.

cudnn_frontend v1.12 is the minimum cudnn frontend version required to work with cuda 13.0 and above

Update the dlpack version and cmake minimum required version to be 3.18

New API

  • Allows compilation and loading of cudnn frontend with cudnn-jit packages.

  • Introduce Adaptive Layernorm (fprop and bprop) operation in cudnn.

std::array<std::shared_ptr<Tensor_attributes>, 3>
adalayernorm(std::shared_ptr<Tensor_attributes>& input,
                         std::shared_ptr<Tensor_attributes>&  scale,
                         std::shared_ptr<Tensor_attributes>&  bias,                                                                                                                                                                                   
                         AdaLayernorm_attributes attributes);

std::array<std::shared_ptr<Tensor_attributes>, 3> adalayernorm_backward(
                         std::shared_ptr<Tensor_attributes>  dy,
                         std::shared_ptr<Tensor_attributes>   x,
                         std::shared_ptr<Tensor_attributes>  scale,                                                                                                                                                    
                         AdaLayernorm_backward_attributes   options);

Please refer to samples for usage.

  • cudnn frontend python API introduces two decorator function cudnn.jit and cudnn.graph for simpler graph creation in python. Refer the matmul sample for usage.

Improvements

SDPA

  • Allows large embedded dimension (d > 128) for fprop across Ampere, Hopper, and Blackwell architectures for bf16/fp16.

  • Added better validation checks for sliding window attention for cudnn version 9.9.0 and below.

  • Sliding windown attention now supports cases when s_q > s_kv

  • sdpa_fp8 operation now pads correctly with negative infinity on masking operation rather than high negative value. This improves the numerical stability of the sdpa operation with fp8 data type.

  • Paged attention now supports page tables in a packed format

Normalizations

  • Allow zero-centered scale in layer norm. Refer to this sample for usage.

Others

  • cudnn frontend now supports serialization of dynamic kernel cache.

Bug Fixes

  • Fixed the dlopen of cudart.so to look for the binary with version name.

  • Correctly fail when SDPA bprop is called on Blackwell with embedded dimension (d) > 128.