cudnn frontend v1.15.0 #174
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
cudnn frontend v1.15 release notes
cudnn frontend v1.15 is the preferred cudnn frontend version for cuDNN version 9.13.1 and above.
New API
cudnn.GraphAPI that enables interoperability betweentorch.tensorsand the cudnn frontend API. Sample code for performing a matmul with bias addition:All notebooks under samples/python have been updated to showcase the flexibility of this API.
Graphnow includes awarmupmethod that triggers kernel loading by performing a fake graph capture. This improves the startup time for running the initial kernel in the actual run and prevents deadlocks when used with other modules (e.g., NCCL).Improvements
SDPA
set_score_maxandset_score_sum_expto allow the kernel to outputmax attention scoreandsum of exponents.s_q==1ands_kv==1.)Matmul
COMPLEX_FP32andCOMPLEX_FP64datatypes. (Requires cuDNN v9.14.0 or later.)Normalizations
fe::HeurMode_t::Aoverfe::HeurMode_t::FALLBACK.Others
swishfunction now accepts aswish_betaparameter.Samples
Bug Fixes
Benchmarks
Issues Resolved