Skip to content

Commit e653778

Browse files
Tianyu Liangfacebook-github-bot
Tianyu Liang
authored andcommitted
FP4 Triton kernel bug fix (#4181)
Summary: Pull Request resolved: #4181 X-link: facebookresearch/FBGEMM#1259 Fix loop iteration index calculation bug in triton kernel Reviewed By: q10, jiawenliu64, jianyuh Differential Revision: D75269590 fbshipit-source-id: c40bfeef4c487d9cdc0f0b1ba58071027a822961
1 parent 9ba0bf5 commit e653778

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

fbgemm_gpu/experimental/gemm/triton_gemm/fp4_quantize.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -290,7 +290,7 @@ def _kernel_quantize_mx4_unpack(
290290
# Update offsets so we work on the next block.
291291
input_offset += GROUP_LOAD * GROUP_SIZE
292292
exp_offset += GROUP_LOAD
293-
output_offset += GROUP_LOAD * GROUP_SIZE
293+
output_offset += GROUP_LOAD * GROUP_SIZE // 2
294294

295295

296296
def triton_quantize_mx4_unpack(

0 commit comments

Comments
 (0)