[BUG] [OpenVino EP] Only first result in session is correct. #19975
Description
Describe the issue
When running inference session ONLY with OpenVino EP and ORT > 1.13.1 any results except first are incorrect. There are no issues with ORT == 1.13.1 or CPU/CUDA/XNNPACK on any ORT version.
Getting this issue only on one model (Attention OCR) - model structure you can find at the bottom, other models works fine. seems there are some layers/functions in it that was broken after 1.13.1 build...
Description:
Ubuntu 22.04, Onnxruntime 1.17.1, OpenVino 2023.3, C++
Model: sort of Attention Decoder OCR, converted to onnx from pytorch.
Issue:
im inferencing the same image (also tried on sequence of different images durning the inference session). Only the FIRST result is correct. Second result and so on looks like partially "cropped" first result doesnt matter if next input data is new...
For example inferencing sequence of images with text "1234567890", "ABCDEFGHJK", "7777777777". Getting: "1234567890", "1200120012", "1200120012"...
Downgrade to ORT 1.13.1 solved the issue, but seems that something is broken after 1.13.1 build.
All other EP (CPU, CUDA, XNNPACK) works well with the same code.
Found one reference to similar issue in OpenVino github: openvinotoolkit/openvino#12966
Enabled verbose mode and found that node placements are differ between 1.17.1 (incorrect) and 1.13.1(correct) inference sessions, maybe it's matters, but doesn't explain why first result is always correct...:
correct inference session with node placements(1.13.1):
* Node placements
*Node(s) placed on [OpenVINOExecutionProvider]. Number of nodes: 11
OpenVINO-EP-subgraph_1 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_1_0)
OpenVINO-EP-subgraph_2 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_2_1)
OpenVINO-EP-subgraph_3 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_3_2)
OpenVINO-EP-subgraph_4 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_4_3)
OpenVINO-EP-subgraph_5 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_5_4)
OpenVINO-EP-subgraph_6 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_6_5)
OpenVINO-EP-subgraph_7 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_7_6)
OpenVINO-EP-subgraph_8 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_8_7)
OpenVINO-EP-subgraph_9 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_9_8)
OpenVINO-EP-subgraph_10 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_10_9)
OpenVINO-EP-subgraph_11 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_11_10)
*Node(s) placed on [CPUExecutionProvider]. Number of nodes: 167
GRU (/decoder/rnn/GRU)
LogSoftmax (/decoder/LogSoftmax)
ArgMax (/decoder/ArgMax)
Unsqueeze (/decoder/Unsqueeze)
Transpose (/decoder/Transpose_2)
Gather (/decoder/emb_1/Gather)
Expand (/decoder/attention_1/Expand)
Transpose (/decoder/attention_1/Transpose)
Concat (/decoder/attention_1/Concat)
MatMul (/decoder/attention/attn_1/MatMul)
Add (/decoder/attention/attn_1/Add)
Tanh (/decoder/attention_1/Tanh)
Softmax (/decoder/attention_1/Softmax)
MatMul (/decoder/MatMul_1)
Transpose (/decoder/Transpose_3)
Concat (/decoder/Concat_1)
GRU (/decoder/rnn_1/GRU)
LogSoftmax (/decoder/LogSoftmax_1)
ArgMax (/decoder/ArgMax_1)
Unsqueeze (/decoder/Unsqueeze_1)
Transpose (/decoder/Transpose_4)
Gather (/decoder/emb_2/Gather)
Expand (/decoder/attention_2/Expand)
Transpose (/decoder/attention_2/Transpose)
Concat (/decoder/attention_2/Concat)
MatMul (/decoder/attention/attn_2/MatMul)
Add (/decoder/attention/attn_2/Add)
Tanh (/decoder/attention_2/Tanh)
Softmax (/decoder/attention_2/Softmax)
MatMul (/decoder/MatMul_2)
Transpose (/decoder/Transpose_5)
Concat (/decoder/Concat_2)
GRU (/decoder/rnn_2/GRU)
LogSoftmax (/decoder/LogSoftmax_2)
ArgMax (/decoder/ArgMax_2)
Unsqueeze (/decoder/Unsqueeze_2)
Transpose (/decoder/Transpose_6)
Gather (/decoder/emb_3/Gather)
Expand (/decoder/attention_3/Expand)
Transpose (/decoder/attention_3/Transpose)
Concat (/decoder/attention_3/Concat)
MatMul (/decoder/attention/attn_3/MatMul)
Add (/decoder/attention/attn_3/Add)
Tanh (/decoder/attention_3/Tanh)
Softmax (/decoder/attention_3/Softmax)
MatMul (/decoder/MatMul_3)
Transpose (/decoder/Transpose_7)
Concat (/decoder/Concat_3)
GRU (/decoder/rnn_3/GRU)
LogSoftmax (/decoder/LogSoftmax_3)
ArgMax (/decoder/ArgMax_3)
Unsqueeze (/decoder/Unsqueeze_3)
Transpose (/decoder/Transpose_8)
Gather (/decoder/emb_4/Gather)
Expand (/decoder/attention_4/Expand)
Transpose (/decoder/attention_4/Transpose)
Concat (/decoder/attention_4/Concat)
MatMul (/decoder/attention/attn_4/MatMul)
Add (/decoder/attention/attn_4/Add)
Tanh (/decoder/attention_4/Tanh)
Softmax (/decoder/attention_4/Softmax)
MatMul (/decoder/MatMul_4)
Transpose (/decoder/Transpose_9)
Concat (/decoder/Concat_4)
GRU (/decoder/rnn_4/GRU)
LogSoftmax (/decoder/LogSoftmax_4)
ArgMax (/decoder/ArgMax_4)
Unsqueeze (/decoder/Unsqueeze_4)
Transpose (/decoder/Transpose_10)
Gather (/decoder/emb_5/Gather)
Expand (/decoder/attention_5/Expand)
Transpose (/decoder/attention_5/Transpose)
Concat (/decoder/attention_5/Concat)
MatMul (/decoder/attention/attn_5/MatMul)
Add (/decoder/attention/attn_5/Add)
Tanh (/decoder/attention_5/Tanh)
Softmax (/decoder/attention_5/Softmax)
MatMul (/decoder/MatMul_5)
Transpose (/decoder/Transpose_11)
Concat (/decoder/Concat_5)
GRU (/decoder/rnn_5/GRU)
LogSoftmax (/decoder/LogSoftmax_5)
ArgMax (/decoder/ArgMax_5)
Unsqueeze (/decoder/Unsqueeze_5)
Transpose (/decoder/Transpose_12)
Gather (/decoder/emb_6/Gather)
Expand (/decoder/attention_6/Expand)
Transpose (/decoder/attention_6/Transpose)
Concat (/decoder/attention_6/Concat)
MatMul (/decoder/attention/attn_6/MatMul)
Add (/decoder/attention/attn_6/Add)
Tanh (/decoder/attention_6/Tanh)
Softmax (/decoder/attention_6/Softmax)
MatMul (/decoder/MatMul_6)
Transpose (/decoder/Transpose_13)
Concat (/decoder/Concat_6)
GRU (/decoder/rnn_6/GRU)
LogSoftmax (/decoder/LogSoftmax_6)
ArgMax (/decoder/ArgMax_6)
Unsqueeze (/decoder/Unsqueeze_6)
Transpose (/decoder/Transpose_14)
Gather (/decoder/emb_7/Gather)
Expand (/decoder/attention_7/Expand)
Transpose (/decoder/attention_7/Transpose)
Concat (/decoder/attention_7/Concat)
MatMul (/decoder/attention/attn_7/MatMul)
Add (/decoder/attention/attn_7/Add)
Tanh (/decoder/attention_7/Tanh)
Softmax (/decoder/attention_7/Softmax)
MatMul (/decoder/MatMul_7)
Transpose (/decoder/Transpose_15)
Concat (/decoder/Concat_7)
GRU (/decoder/rnn_7/GRU)
LogSoftmax (/decoder/LogSoftmax_7)
ArgMax (/decoder/ArgMax_7)
Unsqueeze (/decoder/Unsqueeze_7)
Transpose (/decoder/Transpose_16)
Gather (/decoder/emb_8/Gather)
Expand (/decoder/attention_8/Expand)
Transpose (/decoder/attention_8/Transpose)
Concat (/decoder/attention_8/Concat)
MatMul (/decoder/attention/attn_8/MatMul)
Add (/decoder/attention/attn_8/Add)
Tanh (/decoder/attention_8/Tanh)
Softmax (/decoder/attention_8/Softmax)
MatMul (/decoder/MatMul_8)
Transpose (/decoder/Transpose_17)
Concat (/decoder/Concat_8)
GRU (/decoder/rnn_8/GRU)
LogSoftmax (/decoder/LogSoftmax_8)
ArgMax (/decoder/ArgMax_8)
Unsqueeze (/decoder/Unsqueeze_8)
Transpose (/decoder/Transpose_18)
Gather (/decoder/emb_9/Gather)
Expand (/decoder/attention_9/Expand)
Transpose (/decoder/attention_9/Transpose)
Concat (/decoder/attention_9/Concat)
MatMul (/decoder/attention/attn_9/MatMul)
Add (/decoder/attention/attn_9/Add)
Tanh (/decoder/attention_9/Tanh)
Softmax (/decoder/attention_9/Softmax)
MatMul (/decoder/MatMul_9)
Transpose (/decoder/Transpose_19)
Concat (/decoder/Concat_9)
GRU (/decoder/rnn_9/GRU)
LogSoftmax (/decoder/LogSoftmax_9)
Unsqueeze (/decoder/Unsqueeze_9)
Unsqueeze (/decoder/Unsqueeze_10)
Unsqueeze (/decoder/Unsqueeze_11)
Unsqueeze (/decoder/Unsqueeze_12)
Unsqueeze (/decoder/Unsqueeze_13)
Unsqueeze (/decoder/Unsqueeze_14)
Unsqueeze (/decoder/Unsqueeze_15)
Unsqueeze (/decoder/Unsqueeze_16)
Unsqueeze (/decoder/Unsqueeze_17)
Unsqueeze (/decoder/Unsqueeze_18)
Concat (/decoder/Concat_10)
Transpose (/decoder/Transpose_20)
FusedMatMul (MatMul_With_Transpose)
FusedMatMul (MatMul_With_Transpose_token_0)
FusedMatMul (MatMul_With_Transpose_token_1)
FusedMatMul (MatMul_With_Transpose_token_2)
FusedMatMul (MatMul_With_Transpose_token_3)
FusedMatMul (MatMul_With_Transpose_token_4)
FusedMatMul (MatMul_With_Transpose_token_5)
FusedMatMul (MatMul_With_Transpose_token_6)
FusedMatMul (MatMul_With_Transpose_token_7)
Incorrect inference result node placement (1.17.1)
* Node placements
*Node(s) placed on [OpenVINOExecutionProvider]. Number of nodes: 11
OpenVINO-EP-subgraph_1 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_1_0)
OpenVINO-EP-subgraph_2 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_2_1)
OpenVINO-EP-subgraph_3 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_3_2)
OpenVINO-EP-subgraph_4 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_4_3)
OpenVINO-EP-subgraph_5 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_5_4)
OpenVINO-EP-subgraph_6 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_6_5)
OpenVINO-EP-subgraph_7 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_7_6)
OpenVINO-EP-subgraph_8 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_8_7)
OpenVINO-EP-subgraph_9 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_9_8)
OpenVINO-EP-subgraph_10 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_10_9)
OpenVINO-EP-subgraph_11 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_11_10)
*Node(s) placed on [CPUExecutionProvider]. Number of nodes: 167
GRU (/decoder/rnn/GRU)
LogSoftmax (/decoder/LogSoftmax)
ArgMax (/decoder/ArgMax)
Unsqueeze (/decoder/Unsqueeze)
Transpose (/decoder/Transpose_2)
Gather (/decoder/emb_1/Gather)
Expand (/decoder/attention_1/Expand)
Transpose (/decoder/attention_1/Transpose)
Concat (/decoder/attention_1/Concat)
MatMul (/decoder/attention/attn_1/MatMul)
Add (/decoder/attention/attn_1/Add)
Tanh (/decoder/attention_1/Tanh)
Softmax (/decoder/attention_1/Softmax)
MatMul (/decoder/MatMul_1)
Transpose (/decoder/Transpose_3)
Concat (/decoder/Concat_1)
GRU (/decoder/rnn_1/GRU)
LogSoftmax (/decoder/LogSoftmax_1)
ArgMax (/decoder/ArgMax_1)
Unsqueeze (/decoder/Unsqueeze_1)
Transpose (/decoder/Transpose_4)
Gather (/decoder/emb_2/Gather)
Expand (/decoder/attention_2/Expand)
Transpose (/decoder/attention_2/Transpose)
Concat (/decoder/attention_2/Concat)
MatMul (/decoder/attention/attn_2/MatMul)
Add (/decoder/attention/attn_2/Add)
Tanh (/decoder/attention_2/Tanh)
Softmax (/decoder/attention_2/Softmax)
MatMul (/decoder/MatMul_2)
Transpose (/decoder/Transpose_5)
Concat (/decoder/Concat_2)
GRU (/decoder/rnn_2/GRU)
LogSoftmax (/decoder/LogSoftmax_2)
ArgMax (/decoder/ArgMax_2)
Unsqueeze (/decoder/Unsqueeze_2)
Transpose (/decoder/Transpose_6)
Gather (/decoder/emb_3/Gather)
Expand (/decoder/attention_3/Expand)
Transpose (/decoder/attention_3/Transpose)
Concat (/decoder/attention_3/Concat)
MatMul (/decoder/attention/attn_3/MatMul)
Add (/decoder/attention/attn_3/Add)
Tanh (/decoder/attention_3/Tanh)
Softmax (/decoder/attention_3/Softmax)
MatMul (/decoder/MatMul_3)
Transpose (/decoder/Transpose_7)
Concat (/decoder/Concat_3)
GRU (/decoder/rnn_3/GRU)
LogSoftmax (/decoder/LogSoftmax_3)
ArgMax (/decoder/ArgMax_3)
Unsqueeze (/decoder/Unsqueeze_3)
Transpose (/decoder/Transpose_8)
Gather (/decoder/emb_4/Gather)
Expand (/decoder/attention_4/Expand)
Transpose (/decoder/attention_4/Transpose)
Concat (/decoder/attention_4/Concat)
MatMul (/decoder/attention/attn_4/MatMul)
Add (/decoder/attention/attn_4/Add)
Tanh (/decoder/attention_4/Tanh)
Softmax (/decoder/attention_4/Softmax)
MatMul (/decoder/MatMul_4)
Transpose (/decoder/Transpose_9)
Concat (/decoder/Concat_4)
GRU (/decoder/rnn_4/GRU)
LogSoftmax (/decoder/LogSoftmax_4)
ArgMax (/decoder/ArgMax_4)
Unsqueeze (/decoder/Unsqueeze_4)
Transpose (/decoder/Transpose_10)
Gather (/decoder/emb_5/Gather)
Expand (/decoder/attention_5/Expand)
Transpose (/decoder/attention_5/Transpose)
Concat (/decoder/attention_5/Concat)
MatMul (/decoder/attention/attn_5/MatMul)
Add (/decoder/attention/attn_5/Add)
Tanh (/decoder/attention_5/Tanh)
Softmax (/decoder/attention_5/Softmax)
MatMul (/decoder/MatMul_5)
Transpose (/decoder/Transpose_11)
Concat (/decoder/Concat_5)
GRU (/decoder/rnn_5/GRU)
LogSoftmax (/decoder/LogSoftmax_5)
ArgMax (/decoder/ArgMax_5)
Unsqueeze (/decoder/Unsqueeze_5)
Transpose (/decoder/Transpose_12)
Gather (/decoder/emb_6/Gather)
Expand (/decoder/attention_6/Expand)
Transpose (/decoder/attention_6/Transpose)
Concat (/decoder/attention_6/Concat)
MatMul (/decoder/attention/attn_6/MatMul)
Add (/decoder/attention/attn_6/Add)
Tanh (/decoder/attention_6/Tanh)
Softmax (/decoder/attention_6/Softmax)
MatMul (/decoder/MatMul_6)
Transpose (/decoder/Transpose_13)
Concat (/decoder/Concat_6)
GRU (/decoder/rnn_6/GRU)
LogSoftmax (/decoder/LogSoftmax_6)
ArgMax (/decoder/ArgMax_6)
Unsqueeze (/decoder/Unsqueeze_6)
Transpose (/decoder/Transpose_14)
Gather (/decoder/emb_7/Gather)
Expand (/decoder/attention_7/Expand)
Transpose (/decoder/attention_7/Transpose)
Concat (/decoder/attention_7/Concat)
MatMul (/decoder/attention/attn_7/MatMul)
Add (/decoder/attention/attn_7/Add)
Tanh (/decoder/attention_7/Tanh)
Softmax (/decoder/attention_7/Softmax)
MatMul (/decoder/MatMul_7)
Transpose (/decoder/Transpose_15)
Concat (/decoder/Concat_7)
GRU (/decoder/rnn_7/GRU)
LogSoftmax (/decoder/LogSoftmax_7)
ArgMax (/decoder/ArgMax_7)
Unsqueeze (/decoder/Unsqueeze_7)
Transpose (/decoder/Transpose_16)
Gather (/decoder/emb_8/Gather)
Expand (/decoder/attention_8/Expand)
Transpose (/decoder/attention_8/Transpose)
Concat (/decoder/attention_8/Concat)
MatMul (/decoder/attention/attn_8/MatMul)
Add (/decoder/attention/attn_8/Add)
Tanh (/decoder/attention_8/Tanh)
Softmax (/decoder/attention_8/Softmax)
MatMul (/decoder/MatMul_8)
Transpose (/decoder/Transpose_17)
Concat (/decoder/Concat_8)
GRU (/decoder/rnn_8/GRU)
LogSoftmax (/decoder/LogSoftmax_8)
ArgMax (/decoder/ArgMax_8)
Unsqueeze (/decoder/Unsqueeze_8)
Transpose (/decoder/Transpose_18)
Gather (/decoder/emb_9/Gather)
Expand (/decoder/attention_9/Expand)
Transpose (/decoder/attention_9/Transpose)
Concat (/decoder/attention_9/Concat)
MatMul (/decoder/attention/attn_9/MatMul)
Add (/decoder/attention/attn_9/Add)
Tanh (/decoder/attention_9/Tanh)
Softmax (/decoder/attention_9/Softmax)
MatMul (/decoder/MatMul_9)
Transpose (/decoder/Transpose_19)
Concat (/decoder/Concat_9)
GRU (/decoder/rnn_9/GRU)
LogSoftmax (/decoder/LogSoftmax_9)
Unsqueeze (/decoder/Unsqueeze_9)
Unsqueeze (/decoder/Unsqueeze_10)
Unsqueeze (/decoder/Unsqueeze_11)
Unsqueeze (/decoder/Unsqueeze_12)
Unsqueeze (/decoder/Unsqueeze_13)
Unsqueeze (/decoder/Unsqueeze_14)
Unsqueeze (/decoder/Unsqueeze_15)
Unsqueeze (/decoder/Unsqueeze_16)
Unsqueeze (/decoder/Unsqueeze_17)
Unsqueeze (/decoder/Unsqueeze_18)
Concat (/decoder/Concat_10)
Transpose (/decoder/Transpose_20)
FusedMatMul (MatMul_With_Transpose)
FusedMatMul (MatMul_With_Transpose_token_18)
FusedMatMul (MatMul_With_Transpose_token_19)
FusedMatMul (MatMul_With_Transpose_token_20)
FusedMatMul (MatMul_With_Transpose_token_21)
FusedMatMul (MatMul_With_Transpose_token_22)
FusedMatMul (MatMul_With_Transpose_token_23)
FusedMatMul (MatMul_With_Transpose_token_24)
FusedMatMul (MatMul_With_Transpose_token_25)
as you can see the difference is only on last 8 lines (matmuls token ids differs). Hope it'll help...
F
To reproduce
Look description.
Urgency
Urgent
Platform
Linux
OS Version
Ubuntu 22.04
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.17.1 release
ONNX Runtime API
C++
Architecture
X64
Execution Provider
OpenVINO
Execution Provider Library Version
2023.3