Skip to content

Commit 7fe2ae4

Browse files
authored
sycl : port multi-column MMVQ from CUDA backend (#21845)
mmvq: Port the ncols_dst optimization from ggml-cuda/mmvq.cu to SYCL. Read weights once per dispatch instead of once per column. Covers all standard quant types + reorder paths for Q4_0, Q8_0, Q3_K, Q4_K, Q5_K, Q6_K. IQ types (except IQ4_XS) excluded due to incompatible vec_dot signatures. ggml-sycl: The weight reorder was only bootstrapped on single-token mat-vec (ne[1] == 1). Speculative / MTP verify issues only multi-column mat-vec, so it never triggered the reorder and ran on the slower non-reorder kernel. Bootstrap it on small multi-column batches (ne[1] <= 8) too.
1 parent 7c158fb commit 7fe2ae4

2 files changed

Lines changed: 1095 additions & 27 deletions

File tree

ggml/src/ggml-sycl/ggml-sycl.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3971,7 +3971,9 @@ static bool should_reorder_tensor(ggml_backend_sycl_context& ctx, const ggml_ten
39713971
return !g_ggml_sycl_disable_optimize && //allow optimize, controlled by $GGML_SYCL_DISABLE_OPT
39723972
ctx.opt_feature.reorder && //allow this device due to good perf, skip the devices with bad perf.
39733973
dst->op == GGML_OP_MUL_MAT && //limit to some supported cases of Q4_0, to do for more cases.
3974-
dst->src[1]->ne[1]==1 && dst->src[1]->ne[2]==1 && dst->src[1]->ne[3]==1;
3974+
// ne[1] <= 8 so multi-column decode (spec / MTP verify) also bootstraps the reorder;
3975+
// all reorderable types have a _switch_ncols kernel.
3976+
dst->src[1]->ne[1] <= 8 && dst->src[1]->ne[2]==1 && dst->src[1]->ne[3]==1;
39753977
}
39763978

39773979
static void opt_for_reorder(ggml_backend_sycl_context * ctx, const ggml_tensor * src0, const ggml_tensor * /* src1 */,

0 commit comments

Comments
 (0)