v1.12.0 release
cudnn frontend v1.12 release notes
cudnn frontend v1.12 is the preferred cudnn frontend version for cudnn version 9.9.0 and above.
cudnn_frontend v1.12 is the minimum cudnn frontend version required to work with cuda 13.0 and above
Update the dlpack version and cmake minimum required version to be 3.18
New API
-
Allows compilation and loading of cudnn frontend with cudnn-jit packages.
-
Introduce Adaptive Layernorm (fprop and bprop) operation in cudnn.
std::array<std::shared_ptr<Tensor_attributes>, 3>
adalayernorm(std::shared_ptr<Tensor_attributes>& input,
std::shared_ptr<Tensor_attributes>& scale,
std::shared_ptr<Tensor_attributes>& bias,
AdaLayernorm_attributes attributes);
std::array<std::shared_ptr<Tensor_attributes>, 3> adalayernorm_backward(
std::shared_ptr<Tensor_attributes> dy,
std::shared_ptr<Tensor_attributes> x,
std::shared_ptr<Tensor_attributes> scale,
AdaLayernorm_backward_attributes options);
Please refer to samples for usage.
- cudnn frontend python API introduces two decorator function
cudnn.jitandcudnn.graphfor simpler graph creation in python. Refer the matmul sample for usage.
Improvements
SDPA
-
Allows large embedded dimension (d > 128) for fprop across Ampere, Hopper, and Blackwell architectures for bf16/fp16.
-
Added better validation checks for sliding window attention for cudnn version 9.9.0 and below.
-
Sliding windown attention now supports cases when s_q > s_kv
-
sdpa_fp8 operation now pads correctly with negative infinity on masking operation rather than high negative value. This improves the numerical stability of the sdpa operation with fp8 data type.
-
Paged attention now supports page tables in a packed format
Normalizations
- Allow zero-centered scale in layer norm. Refer to this sample for usage.
Others
- cudnn frontend now supports serialization of dynamic kernel cache.
Bug Fixes
-
Fixed the dlopen of cudart.so to look for the binary with version name.
-
Correctly fail when SDPA bprop is called on Blackwell with embedded dimension (d) > 128.