[CUDA] GroupQueryAttention with XQA and Quantized KV Cache Support #8549
| Job | Run time |
|---|---|
| 12m 2s | |
| 10m 3s | |
| 8m 32s | |
| 17m 15s | |
| 7m 7s | |
| 9m 42s | |
| 7m 18s | |
| 8m 6s | |
| 8m 21s | |
| 9m 2s | |
| 1h 37m 28s |
| Job | Run time |
|---|---|
| 12m 2s | |
| 10m 3s | |
| 8m 32s | |
| 17m 15s | |
| 7m 7s | |
| 9m 42s | |
| 7m 18s | |
| 8m 6s | |
| 8m 21s | |
| 9m 2s | |
| 1h 37m 28s |