Commit 7fe2ae4
authored
sycl : port multi-column MMVQ from CUDA backend (#21845)
mmvq:
Port the ncols_dst optimization from ggml-cuda/mmvq.cu to SYCL.
Read weights once per dispatch instead of once per column.
Covers all standard quant types + reorder paths for Q4_0, Q8_0,
Q3_K, Q4_K, Q5_K, Q6_K. IQ types (except IQ4_XS) excluded due to
incompatible vec_dot signatures.
ggml-sycl:
The weight reorder was only bootstrapped on single-token mat-vec
(ne[1] == 1). Speculative / MTP verify issues only multi-column mat-vec,
so it never triggered the reorder and ran on the slower non-reorder
kernel. Bootstrap it on small multi-column batches (ne[1] <= 8) too.1 parent 7c158fb commit 7fe2ae4
2 files changed
Lines changed: 1095 additions & 27 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3971 | 3971 | | |
3972 | 3972 | | |
3973 | 3973 | | |
3974 | | - | |
| 3974 | + | |
| 3975 | + | |
| 3976 | + | |
3975 | 3977 | | |
3976 | 3978 | | |
3977 | 3979 | | |
| |||
0 commit comments