Skip to content

Commit 59b42ab

Browse files
Fix nullptr-with-zero-offset ubsan issue in bf16 downscale path (#543)
When c_downscale < DLP_F32 (bf16 output) and k <= KC, the temporary float buffer for C was not allocated, leaving c_use_jc as NULL. Later pointer arithmetic c_use_ic = c_use_jc + offset triggered UBSan's 'applying zero offset to null pointer'. Always assign c_use_jc = c + jc so the arithmetic is well-defined; the value is unused when c_downscale < DLP_F32 (k <= KC path writes directly to buf_downscale). Co-authored-by: Zhiyi Zhang <zhiyizhang@meta.com>
1 parent 3a43dd1 commit 59b42ab

3 files changed

Lines changed: 20 additions & 16 deletions

File tree

classic/frame/bf16bf16f32/dlp_gemm_bf16.c

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -385,11 +385,12 @@ DLP_GEMM_5LOOP_AVX512BF16(bfloat16, bfloat16, float, bf16bf16f32of32)
385385
&n_sub_updated);
386386
}
387387

388-
if (c_downscale == DLP_F32) {
389-
c_use_jc = c + jc;
390-
}
388+
// This is for c_downscale == DLP_F32. When c_downscale < DLP_F32,
389+
// this value will not be used.
390+
c_use_jc = c + jc;
391+
391392
// Temp accumulaton buffer for C allocation.
392-
else if (c_downscale < DLP_F32) {
393+
if (c_downscale < DLP_F32) {
393394
// Buffer memory is only required if output needs to be
394395
// persisted across iterations of the pc/KC loop.
395396
// It was observed that the locks used while checking out
@@ -1067,11 +1068,12 @@ DLP_GEMM_5LOOP_F32_FALLBACK(bfloat16, bfloat16, float, bf16bf16f32of32)
10671068
&n_sub_updated);
10681069
}
10691070

1070-
if (c_downscale == DLP_F32) {
1071-
c_use_jc = c + jc;
1072-
}
1071+
// This is for c_downscale == DLP_F32. When c_downscale < DLP_F32,
1072+
// this value will not be used.
1073+
c_use_jc = c + jc;
1074+
10731075
// Temp accumulaton buffer for C allocation.
1074-
else if (c_downscale < DLP_F32) {
1076+
if (c_downscale < DLP_F32) {
10751077
// Buffer memory is only required if output needs to be
10761078
// persisted across iterations of the pc/KC loop.
10771079
// It was observed that the locks used while checking out

classic/frame/bf16bf16f32/dlp_gemm_bf16s4.c

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -178,11 +178,12 @@ DLP_GEMM_5LOOP_UNIFIED(bfloat16, int8_t, float, float, bf16s4f32of32, const)
178178
dlp_gemm_get_packb_strides(lcntx, &rs_b_use, &cs_b_use);
179179
}
180180

181-
if (c_downscale == DLP_F32) {
182-
c_use_jc = c + jc;
183-
}
181+
// This is for c_downscale == DLP_F32. When c_downscale < DLP_F32,
182+
// this value will not be used.
183+
c_use_jc = c + jc;
184+
184185
// Temp accumulaton buffer for C allocation.
185-
else if (c_downscale < DLP_F32) {
186+
if (c_downscale < DLP_F32) {
186187
// Buffer memory is only required if output needs to be
187188
// persisted across iterations of the pc/KC loop.
188189
// It was observed that the locks used while checking out

classic/frame/bf16bf16f32/dlp_gemm_bf16u4.c

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -184,11 +184,12 @@ DLP_GEMM_5LOOP_UNIFIED(bfloat16, uint8_t, float, float, bf16u4f32of32, const)
184184
dlp_gemm_get_packb_strides(lcntx, &rs_b_use, &cs_b_use);
185185
}
186186

187-
if (c_downscale == DLP_F32) {
188-
c_use_jc = c + jc;
189-
}
187+
// This is for c_downscale == DLP_F32. When c_downscale < DLP_F32,
188+
// this value will not be used.
189+
c_use_jc = c + jc;
190+
190191
// Temp accumulaton buffer for C allocation.
191-
else if (c_downscale < DLP_F32) {
192+
if (c_downscale < DLP_F32) {
192193
// Buffer memory is only required if output needs to be
193194
// persisted across iterations of the pc/KC loop.
194195
// It was observed that the locks used while checking out

0 commit comments

Comments
 (0)