Skip to content

Commit 11d80de

Browse files
Tianyu Liangfacebook-github-bot
Tianyu Liang
authored andcommitted
FP4 Triton kernel bug fix (pytorch#4181)
Summary: X-link: facebookresearch/FBGEMM#1259 Fix loop iteration index calculation bug in triton kernel Reviewed By: q10, jiawenliu64, jianyuh Differential Revision: D75269590
1 parent b83a755 commit 11d80de

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

fbgemm_gpu/experimental/gemm/triton_gemm/fp4_quantize.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -290,7 +290,7 @@ def _kernel_quantize_mx4_unpack(
290290
# Update offsets so we work on the next block.
291291
input_offset += GROUP_LOAD * GROUP_SIZE
292292
exp_offset += GROUP_LOAD
293-
output_offset += GROUP_LOAD * GROUP_SIZE
293+
output_offset += GROUP_LOAD * GROUP_SIZE // 2
294294

295295

296296
def triton_quantize_mx4_unpack(

0 commit comments

Comments
 (0)