Skip to content

[GPU] Fix in-place crop optimization for batch-axis split with spatial-flatten reshape#34680

Queued
andrew-k-park wants to merge 1 commit intoopenvinotoolkit:masterfrom
andrew-k-park:fix_extend_in_place_crop_opt
Queued

[GPU] Fix in-place crop optimization for batch-axis split with spatial-flatten reshape#34680
andrew-k-park wants to merge 1 commit intoopenvinotoolkit:masterfrom
andrew-k-park:fix_extend_in_place_crop_opt

Conversation

@andrew-k-park
Copy link
Contributor

@andrew-k-park andrew-k-park commented Mar 13, 2026

Description of the issue

  • Symptom: RAFT Large (torchvision optical flow) produces completely wrong inference results on GPU when model.reshape() is called to set a static batch size. SSIM drops from 0.999 to 0.903, with max absolute error ~3.86 across all 12 outputs.
  • Root-cause: Commit 222c9d4 ("Extend in-place crop optimization to support batch-axis split with reshape") only checked crop_axis == 0 && batch == 1 to identify batch-squeeze reshapes. This incorrectly matched RAFT's spatial-flatten reshape pattern [1, 256, 65, 120] → [1, 256, 7800] where the batch dim is preserved but spatial dims are merged. The wrong dyn_pad_dims padding was applied to the crop and its downstream reshape, causing them to read data from incorrect memory offsets.
  • Resolution: Added a guard that checks the reshape output's first dimension: if it is statically 1, the batch dim was not squeezed (spatial-flatten case) and the batch-axis optimization is skipped. The condition !(reshape_output[0].is_static() && reshape_output[0].get_length() == 1) is applied at all three code paths: is_runtime_propagatable_padding(), build-time dynamic padding mask, and runtime padding offset calculation.

The code and line that caused this issue (if it is not changed directly)

  • intel_gpu/src/graph/include/reshape_inst.h
  • intel_gpu/src/graph/graph_optimizer/prepare_buffer_fusing.cpp

Reproduction step and snapshot (if applicable. Do not attach for customer model)

  • $ pytest test_ovc_mo.py --modules pipelines/production/pytorch/heavy -k PyTorch_TorchvisionRaftLarge --dynamism_type=None --log-cli-level INFO

Problematic graph

  • RAFT contains the following subgraph repeated in its correlation/update blocks:
    • The optimization incorrectly treated this as the batch-squeeze pattern ([1,C,H,W] → [C,H,W]) and applied batch-axis dynamic padding, causing crop nodes to be marked optimized out with wrong dyn_pad_dims.
    • In the graph dump, exactly 2 nodes differed from the reference: VariadicSplit.out1 (crop) and Reshape_1 changed from nopad to dyn_pad_dims.
image

Checklist

  • Is it a proper fix? (not a workaround)
  • Did you include test case for this fix, if necessary?
  • Did you review existing test that can be extended to cover this scenario? Which test did you review?
    • Reviewed in_place_crop_dynamic_batch_axis_split_with_reshape — this tests the valid batch-squeeze pattern and Added new test

Tickets:

@andrew-k-park andrew-k-park requested review from a team as code owners March 13, 2026 08:41
@andrew-k-park andrew-k-park added the category: GPU OpenVINO GPU plugin label Mar 13, 2026
@e-ddykim e-ddykim enabled auto-merge March 13, 2026 10:47
@andrew-k-park andrew-k-park added this to the 2026.1 milestone Mar 13, 2026
@e-ddykim e-ddykim added this pull request to the merge queue Mar 13, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 13, 2026
@andrew-k-park andrew-k-park added this pull request to the merge queue Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants