Open
Description
-
In the ggml_compute_forward_mul_mat() function in ggml.c, ggml_qgemm_lut() is executed first, which I think is an accumulation operation.
-
Subsequently, in the ggml_compute_forward_mul_mat_one_chunk() function, ggml_vec_dot_i2_i8_s() is executed, which performs a multiply-accumulate operation for ternary and 8-bit data.
My understanding is that the former has already computed the matrix multiplication of ternary and 8-bit data through a lookup table and accumulation. So why is there another multiply-accumulate operation in the following function?
Metadata
Metadata
Assignees
Labels
No labels