Skip to content

[BUG] [OpenVino EP] Only first result in session is correct. #19975

Open
@debugmenot

Description

Describe the issue

When running inference session ONLY with OpenVino EP and ORT > 1.13.1 any results except first are incorrect. There are no issues with ORT == 1.13.1 or CPU/CUDA/XNNPACK on any ORT version.

Getting this issue only on one model (Attention OCR) - model structure you can find at the bottom, other models works fine. seems there are some layers/functions in it that was broken after 1.13.1 build...

Description:

Ubuntu 22.04, Onnxruntime 1.17.1, OpenVino 2023.3, C++
Model: sort of Attention Decoder OCR, converted to onnx from pytorch.

Issue:
im inferencing the same image (also tried on sequence of different images durning the inference session). Only the FIRST result is correct. Second result and so on looks like partially "cropped" first result doesnt matter if next input data is new...
For example inferencing sequence of images with text "1234567890", "ABCDEFGHJK", "7777777777". Getting: "1234567890", "1200120012", "1200120012"...

Downgrade to ORT 1.13.1 solved the issue, but seems that something is broken after 1.13.1 build.
All other EP (CPU, CUDA, XNNPACK) works well with the same code.

Found one reference to similar issue in OpenVino github: openvinotoolkit/openvino#12966

Enabled verbose mode and found that node placements are differ between 1.17.1 (incorrect) and 1.13.1(correct) inference sessions, maybe it's matters, but doesn't explain why first result is always correct...:

correct inference session with node placements(1.13.1):

* Node placements
*Node(s) placed on [OpenVINOExecutionProvider]. Number of nodes: 11

OpenVINO-EP-subgraph_1 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_1_0)
OpenVINO-EP-subgraph_2 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_2_1)
OpenVINO-EP-subgraph_3 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_3_2)
OpenVINO-EP-subgraph_4 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_4_3)
OpenVINO-EP-subgraph_5 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_5_4)
OpenVINO-EP-subgraph_6 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_6_5)
OpenVINO-EP-subgraph_7 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_7_6)
OpenVINO-EP-subgraph_8 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_8_7)
OpenVINO-EP-subgraph_9 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_9_8)
OpenVINO-EP-subgraph_10 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_10_9)
OpenVINO-EP-subgraph_11 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_11_10)
*Node(s) placed on [CPUExecutionProvider]. Number of nodes: 167
GRU (/decoder/rnn/GRU)
LogSoftmax (/decoder/LogSoftmax)
ArgMax (/decoder/ArgMax)
Unsqueeze (/decoder/Unsqueeze)
Transpose (/decoder/Transpose_2)
Gather (/decoder/emb_1/Gather)
Expand (/decoder/attention_1/Expand)
Transpose (/decoder/attention_1/Transpose)
Concat (/decoder/attention_1/Concat)
MatMul (/decoder/attention/attn_1/MatMul)
Add (/decoder/attention/attn_1/Add)
Tanh (/decoder/attention_1/Tanh)
Softmax (/decoder/attention_1/Softmax)
MatMul (/decoder/MatMul_1)
Transpose (/decoder/Transpose_3)
Concat (/decoder/Concat_1)
GRU (/decoder/rnn_1/GRU)
LogSoftmax (/decoder/LogSoftmax_1)
ArgMax (/decoder/ArgMax_1)
Unsqueeze (/decoder/Unsqueeze_1)
Transpose (/decoder/Transpose_4)
Gather (/decoder/emb_2/Gather)
Expand (/decoder/attention_2/Expand)
Transpose (/decoder/attention_2/Transpose)
Concat (/decoder/attention_2/Concat)
MatMul (/decoder/attention/attn_2/MatMul)
Add (/decoder/attention/attn_2/Add)
Tanh (/decoder/attention_2/Tanh)
Softmax (/decoder/attention_2/Softmax)
MatMul (/decoder/MatMul_2)
Transpose (/decoder/Transpose_5)
Concat (/decoder/Concat_2)
GRU (/decoder/rnn_2/GRU)
LogSoftmax (/decoder/LogSoftmax_2)
ArgMax (/decoder/ArgMax_2)
Unsqueeze (/decoder/Unsqueeze_2)
Transpose (/decoder/Transpose_6)
Gather (/decoder/emb_3/Gather)
Expand (/decoder/attention_3/Expand)
Transpose (/decoder/attention_3/Transpose)
Concat (/decoder/attention_3/Concat)
MatMul (/decoder/attention/attn_3/MatMul)
Add (/decoder/attention/attn_3/Add)
Tanh (/decoder/attention_3/Tanh)
Softmax (/decoder/attention_3/Softmax)
MatMul (/decoder/MatMul_3)
Transpose (/decoder/Transpose_7)
Concat (/decoder/Concat_3)
GRU (/decoder/rnn_3/GRU)
LogSoftmax (/decoder/LogSoftmax_3)
ArgMax (/decoder/ArgMax_3)
Unsqueeze (/decoder/Unsqueeze_3)
Transpose (/decoder/Transpose_8)
Gather (/decoder/emb_4/Gather)
Expand (/decoder/attention_4/Expand)
Transpose (/decoder/attention_4/Transpose)
Concat (/decoder/attention_4/Concat)
MatMul (/decoder/attention/attn_4/MatMul)
Add (/decoder/attention/attn_4/Add)
Tanh (/decoder/attention_4/Tanh)
Softmax (/decoder/attention_4/Softmax)
MatMul (/decoder/MatMul_4)
Transpose (/decoder/Transpose_9)
Concat (/decoder/Concat_4)
GRU (/decoder/rnn_4/GRU)
LogSoftmax (/decoder/LogSoftmax_4)
ArgMax (/decoder/ArgMax_4)
Unsqueeze (/decoder/Unsqueeze_4)
Transpose (/decoder/Transpose_10)
Gather (/decoder/emb_5/Gather)
Expand (/decoder/attention_5/Expand)
Transpose (/decoder/attention_5/Transpose)
Concat (/decoder/attention_5/Concat)
MatMul (/decoder/attention/attn_5/MatMul)
Add (/decoder/attention/attn_5/Add)
Tanh (/decoder/attention_5/Tanh)
Softmax (/decoder/attention_5/Softmax)
MatMul (/decoder/MatMul_5)
Transpose (/decoder/Transpose_11)
Concat (/decoder/Concat_5)
GRU (/decoder/rnn_5/GRU)
LogSoftmax (/decoder/LogSoftmax_5)
ArgMax (/decoder/ArgMax_5)
Unsqueeze (/decoder/Unsqueeze_5)
Transpose (/decoder/Transpose_12)
Gather (/decoder/emb_6/Gather)
Expand (/decoder/attention_6/Expand)
Transpose (/decoder/attention_6/Transpose)
Concat (/decoder/attention_6/Concat)
MatMul (/decoder/attention/attn_6/MatMul)
Add (/decoder/attention/attn_6/Add)
Tanh (/decoder/attention_6/Tanh)
Softmax (/decoder/attention_6/Softmax)
MatMul (/decoder/MatMul_6)
Transpose (/decoder/Transpose_13)
Concat (/decoder/Concat_6)
GRU (/decoder/rnn_6/GRU)
LogSoftmax (/decoder/LogSoftmax_6)
ArgMax (/decoder/ArgMax_6)
Unsqueeze (/decoder/Unsqueeze_6)
Transpose (/decoder/Transpose_14)
Gather (/decoder/emb_7/Gather)
Expand (/decoder/attention_7/Expand)
Transpose (/decoder/attention_7/Transpose)
Concat (/decoder/attention_7/Concat)
MatMul (/decoder/attention/attn_7/MatMul)
Add (/decoder/attention/attn_7/Add)
Tanh (/decoder/attention_7/Tanh)
Softmax (/decoder/attention_7/Softmax)
MatMul (/decoder/MatMul_7)
Transpose (/decoder/Transpose_15)
Concat (/decoder/Concat_7)
GRU (/decoder/rnn_7/GRU)
LogSoftmax (/decoder/LogSoftmax_7)
ArgMax (/decoder/ArgMax_7)
Unsqueeze (/decoder/Unsqueeze_7)
Transpose (/decoder/Transpose_16)
Gather (/decoder/emb_8/Gather)
Expand (/decoder/attention_8/Expand)
Transpose (/decoder/attention_8/Transpose)
Concat (/decoder/attention_8/Concat)
MatMul (/decoder/attention/attn_8/MatMul)
Add (/decoder/attention/attn_8/Add)
Tanh (/decoder/attention_8/Tanh)
Softmax (/decoder/attention_8/Softmax)
MatMul (/decoder/MatMul_8)
Transpose (/decoder/Transpose_17)
Concat (/decoder/Concat_8)
GRU (/decoder/rnn_8/GRU)
LogSoftmax (/decoder/LogSoftmax_8)
ArgMax (/decoder/ArgMax_8)
Unsqueeze (/decoder/Unsqueeze_8)
Transpose (/decoder/Transpose_18)
Gather (/decoder/emb_9/Gather)
Expand (/decoder/attention_9/Expand)
Transpose (/decoder/attention_9/Transpose)
Concat (/decoder/attention_9/Concat)
MatMul (/decoder/attention/attn_9/MatMul)
Add (/decoder/attention/attn_9/Add)
Tanh (/decoder/attention_9/Tanh)
Softmax (/decoder/attention_9/Softmax)
MatMul (/decoder/MatMul_9)
Transpose (/decoder/Transpose_19)
Concat (/decoder/Concat_9)
GRU (/decoder/rnn_9/GRU)
LogSoftmax (/decoder/LogSoftmax_9)
Unsqueeze (/decoder/Unsqueeze_9)
Unsqueeze (/decoder/Unsqueeze_10)
Unsqueeze (/decoder/Unsqueeze_11)
Unsqueeze (/decoder/Unsqueeze_12)
Unsqueeze (/decoder/Unsqueeze_13)
Unsqueeze (/decoder/Unsqueeze_14)
Unsqueeze (/decoder/Unsqueeze_15)
Unsqueeze (/decoder/Unsqueeze_16)
Unsqueeze (/decoder/Unsqueeze_17)
Unsqueeze (/decoder/Unsqueeze_18)
Concat (/decoder/Concat_10)
Transpose (/decoder/Transpose_20)
FusedMatMul (MatMul_With_Transpose)
FusedMatMul (MatMul_With_Transpose_token_0)
FusedMatMul (MatMul_With_Transpose_token_1)
FusedMatMul (MatMul_With_Transpose_token_2)
FusedMatMul (MatMul_With_Transpose_token_3)
FusedMatMul (MatMul_With_Transpose_token_4)
FusedMatMul (MatMul_With_Transpose_token_5)
FusedMatMul (MatMul_With_Transpose_token_6)
FusedMatMul (MatMul_With_Transpose_token_7)

Incorrect inference result node placement (1.17.1)

* Node placements
*Node(s) placed on [OpenVINOExecutionProvider]. Number of nodes: 11

OpenVINO-EP-subgraph_1 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_1_0)
OpenVINO-EP-subgraph_2 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_2_1)
OpenVINO-EP-subgraph_3 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_3_2)
OpenVINO-EP-subgraph_4 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_4_3)
OpenVINO-EP-subgraph_5 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_5_4)
OpenVINO-EP-subgraph_6 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_6_5)
OpenVINO-EP-subgraph_7 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_7_6)
OpenVINO-EP-subgraph_8 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_8_7)
OpenVINO-EP-subgraph_9 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_9_8)
OpenVINO-EP-subgraph_10 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_10_9)
OpenVINO-EP-subgraph_11 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_11_10)
*Node(s) placed on [CPUExecutionProvider]. Number of nodes: 167
GRU (/decoder/rnn/GRU)
LogSoftmax (/decoder/LogSoftmax)
ArgMax (/decoder/ArgMax)
Unsqueeze (/decoder/Unsqueeze)
Transpose (/decoder/Transpose_2)
Gather (/decoder/emb_1/Gather)
Expand (/decoder/attention_1/Expand)
Transpose (/decoder/attention_1/Transpose)
Concat (/decoder/attention_1/Concat)
MatMul (/decoder/attention/attn_1/MatMul)
Add (/decoder/attention/attn_1/Add)
Tanh (/decoder/attention_1/Tanh)
Softmax (/decoder/attention_1/Softmax)
MatMul (/decoder/MatMul_1)
Transpose (/decoder/Transpose_3)
Concat (/decoder/Concat_1)
GRU (/decoder/rnn_1/GRU)
LogSoftmax (/decoder/LogSoftmax_1)
ArgMax (/decoder/ArgMax_1)
Unsqueeze (/decoder/Unsqueeze_1)
Transpose (/decoder/Transpose_4)
Gather (/decoder/emb_2/Gather)
Expand (/decoder/attention_2/Expand)
Transpose (/decoder/attention_2/Transpose)
Concat (/decoder/attention_2/Concat)
MatMul (/decoder/attention/attn_2/MatMul)
Add (/decoder/attention/attn_2/Add)
Tanh (/decoder/attention_2/Tanh)
Softmax (/decoder/attention_2/Softmax)
MatMul (/decoder/MatMul_2)
Transpose (/decoder/Transpose_5)
Concat (/decoder/Concat_2)
GRU (/decoder/rnn_2/GRU)
LogSoftmax (/decoder/LogSoftmax_2)
ArgMax (/decoder/ArgMax_2)
Unsqueeze (/decoder/Unsqueeze_2)
Transpose (/decoder/Transpose_6)
Gather (/decoder/emb_3/Gather)
Expand (/decoder/attention_3/Expand)
Transpose (/decoder/attention_3/Transpose)
Concat (/decoder/attention_3/Concat)
MatMul (/decoder/attention/attn_3/MatMul)
Add (/decoder/attention/attn_3/Add)
Tanh (/decoder/attention_3/Tanh)
Softmax (/decoder/attention_3/Softmax)
MatMul (/decoder/MatMul_3)
Transpose (/decoder/Transpose_7)
Concat (/decoder/Concat_3)
GRU (/decoder/rnn_3/GRU)
LogSoftmax (/decoder/LogSoftmax_3)
ArgMax (/decoder/ArgMax_3)
Unsqueeze (/decoder/Unsqueeze_3)
Transpose (/decoder/Transpose_8)
Gather (/decoder/emb_4/Gather)
Expand (/decoder/attention_4/Expand)
Transpose (/decoder/attention_4/Transpose)
Concat (/decoder/attention_4/Concat)
MatMul (/decoder/attention/attn_4/MatMul)
Add (/decoder/attention/attn_4/Add)
Tanh (/decoder/attention_4/Tanh)
Softmax (/decoder/attention_4/Softmax)
MatMul (/decoder/MatMul_4)
Transpose (/decoder/Transpose_9)
Concat (/decoder/Concat_4)
GRU (/decoder/rnn_4/GRU)
LogSoftmax (/decoder/LogSoftmax_4)
ArgMax (/decoder/ArgMax_4)
Unsqueeze (/decoder/Unsqueeze_4)
Transpose (/decoder/Transpose_10)
Gather (/decoder/emb_5/Gather)
Expand (/decoder/attention_5/Expand)
Transpose (/decoder/attention_5/Transpose)
Concat (/decoder/attention_5/Concat)
MatMul (/decoder/attention/attn_5/MatMul)
Add (/decoder/attention/attn_5/Add)
Tanh (/decoder/attention_5/Tanh)
Softmax (/decoder/attention_5/Softmax)
MatMul (/decoder/MatMul_5)
Transpose (/decoder/Transpose_11)
Concat (/decoder/Concat_5)
GRU (/decoder/rnn_5/GRU)
LogSoftmax (/decoder/LogSoftmax_5)
ArgMax (/decoder/ArgMax_5)
Unsqueeze (/decoder/Unsqueeze_5)
Transpose (/decoder/Transpose_12)
Gather (/decoder/emb_6/Gather)
Expand (/decoder/attention_6/Expand)
Transpose (/decoder/attention_6/Transpose)
Concat (/decoder/attention_6/Concat)
MatMul (/decoder/attention/attn_6/MatMul)
Add (/decoder/attention/attn_6/Add)
Tanh (/decoder/attention_6/Tanh)
Softmax (/decoder/attention_6/Softmax)
MatMul (/decoder/MatMul_6)
Transpose (/decoder/Transpose_13)
Concat (/decoder/Concat_6)
GRU (/decoder/rnn_6/GRU)
LogSoftmax (/decoder/LogSoftmax_6)
ArgMax (/decoder/ArgMax_6)
Unsqueeze (/decoder/Unsqueeze_6)
Transpose (/decoder/Transpose_14)
Gather (/decoder/emb_7/Gather)
Expand (/decoder/attention_7/Expand)
Transpose (/decoder/attention_7/Transpose)
Concat (/decoder/attention_7/Concat)
MatMul (/decoder/attention/attn_7/MatMul)
Add (/decoder/attention/attn_7/Add)
Tanh (/decoder/attention_7/Tanh)
Softmax (/decoder/attention_7/Softmax)
MatMul (/decoder/MatMul_7)
Transpose (/decoder/Transpose_15)
Concat (/decoder/Concat_7)
GRU (/decoder/rnn_7/GRU)
LogSoftmax (/decoder/LogSoftmax_7)
ArgMax (/decoder/ArgMax_7)
Unsqueeze (/decoder/Unsqueeze_7)
Transpose (/decoder/Transpose_16)
Gather (/decoder/emb_8/Gather)
Expand (/decoder/attention_8/Expand)
Transpose (/decoder/attention_8/Transpose)
Concat (/decoder/attention_8/Concat)
MatMul (/decoder/attention/attn_8/MatMul)
Add (/decoder/attention/attn_8/Add)
Tanh (/decoder/attention_8/Tanh)
Softmax (/decoder/attention_8/Softmax)
MatMul (/decoder/MatMul_8)
Transpose (/decoder/Transpose_17)
Concat (/decoder/Concat_8)
GRU (/decoder/rnn_8/GRU)
LogSoftmax (/decoder/LogSoftmax_8)
ArgMax (/decoder/ArgMax_8)
Unsqueeze (/decoder/Unsqueeze_8)
Transpose (/decoder/Transpose_18)
Gather (/decoder/emb_9/Gather)
Expand (/decoder/attention_9/Expand)
Transpose (/decoder/attention_9/Transpose)
Concat (/decoder/attention_9/Concat)
MatMul (/decoder/attention/attn_9/MatMul)
Add (/decoder/attention/attn_9/Add)
Tanh (/decoder/attention_9/Tanh)
Softmax (/decoder/attention_9/Softmax)
MatMul (/decoder/MatMul_9)
Transpose (/decoder/Transpose_19)
Concat (/decoder/Concat_9)
GRU (/decoder/rnn_9/GRU)
LogSoftmax (/decoder/LogSoftmax_9)
Unsqueeze (/decoder/Unsqueeze_9)
Unsqueeze (/decoder/Unsqueeze_10)
Unsqueeze (/decoder/Unsqueeze_11)
Unsqueeze (/decoder/Unsqueeze_12)
Unsqueeze (/decoder/Unsqueeze_13)
Unsqueeze (/decoder/Unsqueeze_14)
Unsqueeze (/decoder/Unsqueeze_15)
Unsqueeze (/decoder/Unsqueeze_16)
Unsqueeze (/decoder/Unsqueeze_17)
Unsqueeze (/decoder/Unsqueeze_18)
Concat (/decoder/Concat_10)
Transpose (/decoder/Transpose_20)
FusedMatMul (MatMul_With_Transpose)
FusedMatMul (MatMul_With_Transpose_token_18)
FusedMatMul (MatMul_With_Transpose_token_19)
FusedMatMul (MatMul_With_Transpose_token_20)
FusedMatMul (MatMul_With_Transpose_token_21)
FusedMatMul (MatMul_With_Transpose_token_22)
FusedMatMul (MatMul_With_Transpose_token_23)
FusedMatMul (MatMul_With_Transpose_token_24)
FusedMatMul (MatMul_With_Transpose_token_25)

as you can see the difference is only on last 8 lines (matmuls token ids differs). Hope it'll help...

F

To reproduce

Look description.

Urgency

Urgent

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.17.1 release

ONNX Runtime API

C++

Architecture

X64

Execution Provider

OpenVINO

Execution Provider Library Version

2023.3

Metadata

Assignees

No one assigned

    Labels

    ep:OpenVINOissues related to OpenVINO execution provider

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions