Hi there
We were benchmarking natten3d for our needs, and noticed that turning on temporally causal is worse than running full attention (vs sdpa on cudnn).
Is this something you have noticed too? for inputs like this
- [1, t, h, w]
- num heads: 24
- head dim: 128
- for height/width we're trying 32 and 64
- for t (temporal) we're trying 30 and 60
| Prefix |
GPUs |
Median (ms) |
time |
height |
width |
| attention_ |
1 |
8.582 |
30 |
32 |
32 |
| natten_ima |
1 |
17.338 |
30 |
32 |
32 |
| attention_ |
1 |
33.817 |
60 |
32 |
32 |
| natten_ima |
1 |
61.646 |
60 |
32 |
32 |
| attention_ |
1 |
134.908 |
30 |
64 |
64 |
| natten_ima |
1 |
261.328 |
30 |
64 |
64 |
| attention_ |
1 |
548.231 |
60 |
64 |
64 |
| natten_ima |
1 |
965.665 |
60 |
64 |
64 |
other information
- blackwell gb200
- bf16 dtype
- natten 0.21.1
- GPU Driver 580.126.09
- PyTorch 2.9.0+cu130
- CUDA13.0
- cuDNN9.13.00
- NCCL2.27.7
- Triton3.5.0
- Python3.12.3