cuDNN Frontend v1.16.0 #179
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
cuDNN Frontend v1.16.0 is the recommended version for cuDNN 9.15.0 and later releases.
This release introduces open-source implementations of commonly requested fused kernels for select architectures (Blackwell). These experimental kernels may require additional dependencies such as CuteDSL. The initial release includes:
Additional dependencies can be installed optionally using
pip install nvidia-cudnn-frontend[cutedsl]. Usage examples and detailed documentation are available in the test/python/fe_api directory.Please submit issue reports for additional kernel requests or bug reports.
Block Mask Support: Starting with cuDNN 9.14.0, SDPA attributes now support block masks to exclude tiles that do not require computation. Refer to the sample implementation for usage details.
Bug Fix: Resolved an invalid memory access (IMA) issue in SDPA backward propagation (fixed in cuDNN backend version 9.15.1 and later) that occurred when
s_kvis not a multiple of 128, padding mask is disabled, and operations are performed in CUDA graph replay mode.CUDA Graph Compatibility: Added
BehaviorNote_t::CUDNN_BEHAVIOR_NOTE_CUBLASLT_DEPENDENCYas a behavior note. This enables filtering of engine configurations (execution plans) that use cuBLAS as a backend, available starting with cuDNN version 9.15.0.Block Scale Quantization: Added Python bindings for block scale quantize operations (#173). Refer to the sample implementation for usage details.
Dependency Optimization: PyTorch is no longer a required dependency for cuDNN Frontend (#177).
Tensor Alignment: Enhanced tensor descriptor API to accept alignment as an attribute (#153).
Plan Generation Control: Updated
cudnnGetPlanAPI to accept an optional maximum plan count parameter, enabling users to limit the number of plans built and autotuned.Updated benchmark/sdpa_benchmark_training/benchmark_single_sdpa.py to use correct parameter names and fixed FLOPS calculations for accurate performance measurements.
#153 - Tensor descriptor alignment support
#173 - Block scale quantize Python bindings
#177 - PyTorch dependency removal