Skip to content

[GPU] Fix OOB reads in convolution bfyx_os_iyx_osv16 for multi-batch#36322

Open
davidsnam-intel wants to merge 1 commit into
masterfrom
david/fix-conv-oob
Open

[GPU] Fix OOB reads in convolution bfyx_os_iyx_osv16 for multi-batch#36322
davidsnam-intel wants to merge 1 commit into
masterfrom
david/fix-conv-oob

Conversation

@davidsnam-intel

@davidsnam-intel davidsnam-intel commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Issue

  • DepthAnythingV2 inference fails with CL_OUT_OF_RESOURCES on Xe2+ when running with batch ≥ 2.
  • Pre-Xe2 silently returns zero for out-of-bounds reads, so the bug was latent on older platforms.
  • But, Xe2+ raises CL_OUT_OF_RESOURCES.

Root cause

  • With batch ≥ 2, gws includes dummy output tiles beyond the valid output region for the last batch.
  • These tiles compute input addresses that exceed the tensor's physical length
    (INPUT0_OFFSET_WITH_PADDING + INPUT0_BATCH_PITCH * INPUT0_BATCH_NUM)

Solution

  • Add [min(idx, input0_physical_len - 1)] for index clipping.
  • This is the pattern already used by every other input-read path in the same kernel.
  • The clamped value is never used in actual output computation since dummy tiles are discarded by the output-boundary guard.
  • So, this change has no effect on inference accuracy or performance — it only prevents the hardware fault from the out-of-bounds memory access.

Tickets

@davidsnam-intel davidsnam-intel requested review from a team as code owners June 9, 2026 10:56
@github-actions github-actions Bot added the category: GPU OpenVINO GPU plugin label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant