Skip to content

Commit c3ed240

Browse files
authored
Fix MoE WMMA kernel on V100 (huggingface#3282)
1 parent fd8448d commit c3ed240

1 file changed

Lines changed: 1 addition & 0 deletions

File tree

candle-kernels/src/moe/moe_wmma.cu

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,7 @@ __global__ void moe_gemm_grouped_kernel(
181181

182182
// Accumulate into c_frag (which persists across k_base iterations)
183183
mma_sync(c_frag, a_frag, b_frag, c_frag);
184+
__syncthreads(); // Fix shared memory mismatch on V100
184185
} // end k_base loop (we have a fully-accumulated c_frag for this m_base tile)
185186

186187
// Store the accumulated c_frag to C_sh (shared) once per warp

0 commit comments

Comments
 (0)