You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## cudnn frontend v1.11 release notes
cudnn frontend v1.11 is the preferred cudnn frontend version for cudnn version 9.8.0 and above. With cuDNN frontend v1.11, the minimum supported cudnn version is 9.0.0.
## New API
- cudnn frontend v1.11 release flexible score modifier to the python SDPA API. Samples showcasing soft cap of the attention scores, arrow mask are available in the [cudnn_frontend/test/python/test_flexible_sdpa.py](https://github.com/NVIDIA/cuDNN-frontend/blob/main/cudnn_frontend/test/python/test_flexible_sdpa.py) file.
A sample usage of score modifier is shown below:
```
score_mod=partial(
custom_mask,
mod_tensor=mod_tensor,
neg_inf=neg_inf_tensor,
seq_len_q=seq_len_q,
seq_len_kv=seq_len_kv,
)
```
- The Concatenate operation merges two or more tensors into one, along the specified axis. The user may also specify an in-place merge.
```
std::shared_ptr<Tensor_attributes>
concatenate(std::vector<std::shared_ptr<Tensor_attributes>>, Concatenate_attributes);
```
- pip wheels compatible with windows x86_64 architecture are now available on [pypi](https://pypi.org/project/nvidia-cudnn-frontend/).
- sdpa paged attention API now supports Q tensor to be ragged when used with cudnn version 9.7.0 and above.
## Improvements
- Users can now pass the CMake flag `-DCMAKE_CXX_FLAGS="-DNV_CUDNN_FRONTEND_DISABLE_LOGGING"` to disable logging in the cuDNN frontend.
- Added a new sample to showcase native cudagraph creation from cudnn for sdpa bprop operation. Fixed a bug when using the update_cuda_graph API to update cuda graph for sdpa bprop operation.
## Bug Fixes
- Fixed memory leak in the test harness for some legacy tests that use ragged tensors.
- Fixed a bug introduced in the benchmarking script that prevented the sdpa cudnn operation from being executed. This was because the `use_padding_mask` attribute was made mandatory for the sdpa operation. This has been fixed as well.
- Updated the paged attention sample to not cause illegal memory access when changing the dimensions of the tensors in the sample.
- Updated the DgradDReluBNBwdWeight sample to perform the right operation for the dgrad + drelu fusion.
0 commit comments