cudnn frontend v1.13.0 #150
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
cudnn frontend v1.13 release notes
cudnn frontend v1.13 is the preferred cudnn frontend version for cudnn version 9.11.0 and above.
New API
Introduces device descriptor, which allows for device-less compilation of cudnn graph on a target GPU. See newly added sample and documentation.
Improvements
SDPA
Introduced
generate_statsas an alias tois_inference.generate_statswill be used to control the stat tensor dump.is_inferenceis now deprecated usage.Improved support checks for left and right diagonal bands in conjunction with the diagonal alignment.
Improved error handling for large head dimension (d > 128) in sdpa bprop.
Normalizations
Others
Published improved SDPA training benchmarks for fp8 and fp16/bf16 graph patterns.
Enable int4 Weight only Quantization for matmul. See example
Allow block scale dequantize (required for low precision matmul) to take 2-D scale factor.
Allow reductions to accept deterministic as a attribute.
Added pybinds for block scale dequantize.
Bug Fixes