You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## New API
### cudnn Flex Attention
`SDPA_attributes` and `SDPA_bprop_attributes` now accepts a score_mod function through `set_score_mod` and `set_score_mod_bprop` API. The function accepts a custom chain of pointwise operations which operate on the Attention Score Matrix. Some common functors like causal mask, sliding window mask, soft capping etc. have been added to the headers as reference. More examples of usage have been added in samples for [fprop](fp16_fwd_with_flexible_graphs.cpp) and [bprop](fp16_bwd_with_flexible_graphs.cpp).
### Improvements
- Added support for THD format and sliding window mask.
- Added support for THD format and Bottom right causal mask.
- Added a new parameter called `set_max_total_seq_len_q/set_max_total_seq_len_kv` on the sdpa bprop node. This will help reduce the workspace size required when running with THD format.
- Allow creation of serialized json for dgrad, wgrad and resample operations.
- Added more diagonstic message when the compiled version of cudnn does not match the run-time version of cudnn.
### Bug fixes
- Fixed an issue where log messages unparseable data at the end of messages.
- Fixed an issue where while building the python pip wheel would hang.
- Fixed natively creating cuda graphs for SDPA with alibi masks.
### New samples
- Added a new sample for Layernorm with dynamic shapes and a kernel cache to showcase reduced plan build time when using the kernel cache.
@@ -720,3 +725,8 @@ cuDNN layout support for variable sequence length includes (but is not limited t
720
725
- Valid tokens are not packed together\
721
726
`Q = a0abbb00bb000000`\
722
727
Ragged offset is insufficient to represent this. This case is NOT supported.
728
+
729
+
730
+
### cudnn Flex Attention API
731
+
732
+
SDPA and SDPA_backward ops now accept functors `set_score_mod` and `set_score_mod_bprop`, which allows modification of the attention score matrix. This function can be used to program a sub-graph of pointwise operations that can be subsequently used to program the score modifier. Note that this function usage is exclusive to the usage of ready made options.
0 commit comments