Skip to content

[Bug]: NPU compile_model fails with "to_shape was called on a dynamic shape" for Qwen3-0.6B INT4 (Lunar Lake) #34617

@blairducrayoppat

Description

@blairducrayoppat

OpenVINO Version

2026.0.0-20965-c6d6a13a886-releases/2026/0

Operating System

Windows System

Device used for inference

NPU

Framework

None

Model used

Qwen/Qwen3-0.6B

Issue description

ov.Core().compile_model(model, "NPU") fails with a RuntimeError when compiling
a Qwen3-0.6B INT4 OpenVINO IR model on NPU (Intel NPU 4000, Lunar Lake). The VPUX
compiler encounters a Gather node with dynamic input dimensions (upper bounds set to
INT64_MAX) and raises a "non broadcastable dimensions" diagnostic, followed by a
to_shape was called on a dynamic shape exception.

The identical model IR compiles and runs correctly on both GPU and CPU.

Analysis

The model IR exported by optimum-intel uses dynamic dimensions for batch and sequence
length. The upper bounds for these dimensions contain INT64_MAX (9223372036854775807),
which indicates they were not resolved to concrete values during export.

The VPUX compiler encounters these unresolved dynamic dimensions at multiple nodes:

  • Convert_1 (embedding input)
  • Gather (embedding lookup) — first fatal diagnostic
  • Power, ReduceMean, Multiply (layer norm computations)

The Gather node triggers the critical error: the VPUX compiler attempts to broadcast
dimensions 9223372036854775807 and -9223372036854775808 (signed overflow of INT64_MAX),
which fails. The compiler then calls to_shape() on the resulting dynamic PartialShape,
which is not supported and raises the exception.

Key source locations from the exception chain:

  • src\core\src\partial_shape.cpp:266to_shape() called on dynamic shape
  • src\plugins\intel_npu\src\compiler_adapter\src\ze_graph_ext_wrappers.cpp:405 — L0 graph creation failure
  • src\plugins\intel_npu\src\plugin\src\plugin.cpp:879 — NPU plugin propagation

Cross-Device Comparison

Device Result
GPU (compile_model(model, "GPU")) PASS — compiles and runs correctly
CPU (compile_model(model, "CPU")) PASS — compiles and runs correctly
NPU (compile_model(model, "NPU")) FAILRuntimeError: to_shape was called on a dynamic shape

The GPU and CPU plugins handle the dynamic dimensions correctly. The NPU plugin does not.

Environment

Component Version
OpenVINO 2026.0.0-20965-c6d6a13a886-releases/2026/0
OpenVINO GenAI 2026.0.0.0-2820-dab5b993a38
optimum-intel 1.27.0
nncf 3.0.0
transformers 4.51.3
OS Windows 11 Build 26200
Hardware Core Ultra 7 258V (Lunar Lake), Arc 140V (Xe2), NPU 4000
NPU driver 32.0.100.4514
GPU driver 32.0.101.6987

Model Details

Property Value
Model Qwen/Qwen3-0.6B
Format OpenVINO IR (exported via optimum-intel 1.27.0)
Quantization INT4, per-group (group_size=128), asymmetric
Model size on disk 367.26 MB
Inputs 4 (dynamic batch + sequence dimensions)
Outputs 1
SHA256 (openvino_model.bin) 9d25652b603f65c5a507ef9c4d35c285bbf94e116b707aa20973cff57c2226fd
SHA256 (openvino_model.xml) c568dab243f546ff9f259fdbd1de25091a001949fee197faa1cdd3e784ba7895

Related Issues

Related issue: This was discovered during investigation of #34450 (a separate
as_convolution LLVM ABORT affecting the same model via the heterogeneous LLMPipeline
path). The two bugs are distinct — this one occurs earlier in the VPUX compiler pipeline
and surfaces as a catchable RuntimeError, whereas #34450 terminates the process with
an uncatchable SIGABRT.

Step-by-step reproduction

1. Export the model (one-time):

optimum-cli export openvino \
  --model Qwen/Qwen3-0.6B \
  --weight-format int4 \
  --group-size 128 \
  --ratio 1.0 \
  <output_dir>

2. Attempt NPU compilation:

import openvino as ov

core = ov.Core()
model = core.read_model("<output_dir>/openvino_model.xml")
print(f"Model inputs: {len(model.inputs)}, outputs: {len(model.outputs)}")
compiled = core.compile_model(model, "NPU")  # fails here

Expected Behavior

The model should compile for NPU, consistent with GPU and CPU behavior.

Actual Behavior

The VPUX compiler emits multiple [ERROR] diagnostics about unspecified upper bounds
on nodes with dynamic dimensions, then fails with a RuntimeError.

Relevant log output

**Full stdout:**

OpenVINO: 2026.0.0-20965-c6d6a13a886-releases/2026/0
Available devices: ['CPU', 'GPU', 'NPU']
Reading model...
Model read: 4 inputs, 1 outputs
Compiling for NPU...
[ERROR] 10:33:48.885 [IE::FrontEnd::importNetwork]   Upper bounds are not specified for node '__module.model.embed_tokens/ov_ext::embedding/Convert_1' (type 'Convert'): input '0' bounds are '[9223372036854775807, 9223372036854775807]'
[ERROR] 10:33:48.885 [IE::FrontEnd::importNetwork]   Upper bounds are not specified for node '__module.model.embed_tokens/ov_ext::embedding/Gather' (type 'Gather'): input '1' bounds are '[9223372036854775807, 9223372036854775807]'
[ERROR] 10:33:48.885 [IE::FrontEnd::importNetwork]   Upper bounds are not specified for node '__module.model.layers.0.input_layernorm/aten::pow/Power' (type 'Power'): input '0' bounds are '[9223372036854775807, 9223372036854775807, 1024]'
[ERROR] 10:33:48.885 [IE::FrontEnd::importNetwork]   Upper bounds are not specified for node '__module.model.layers.0.input_layernorm/aten::mean/ReduceMean' (type 'ReduceMean'): input '0' bounds are '[9223372036854775807, 9223372036854775807, 1024]'
[ERROR] 10:33:48.885 [vpux-compiler] Got Diagnostic at loc(fused<{name = "__module.model.embed_tokens/ov_ext::embedding/Gather", type = "Gather"}>["__module.model.embed_tokens/ov_ext::embedding/Gather"]) : Got non broadcastable dimensions pair : '9223372036854775807' and -9223372036854775808'
[ERROR] 10:33:48.886 [IE::FrontEnd::importNetwork]   Upper bounds are not specified for node '__module.model.layers.0.input_layernorm/aten::mul/Multiply' (type 'Multiply'): input '0' bounds are '[9223372036854775807, 9223372036854775807, 1024]'
Python exception: Exception from src\inference\src\cpp\core.cpp:113:
Exception from src\inference\src\dev\plugin.cpp:53:
Exception from src\plugins\intel_npu\src\plugin\src\plugin.cpp:879:
Exception from src\plugins\intel_npu\src\compiler_adapter\src\ze_graph_ext_wrappers.cpp:405:
L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004 - generic error code for invalid arguments . [NPU_VCL] Compiler returned msg:
Exception from src\core\src\partial_shape.cpp:266:
to_shape was called on a dynamic shape.


**Full stderr:**

loc(fused<{name = "__module.model.embed_tokens/ov_ext::embedding/Gather", type = "Gather"}>["__module.model.embed_tokens/ov_ext::embedding/Gather"]): error: Got non broadcastable dimensions pair : '9223372036854775807' and -9223372036854775808'
Traceback (most recent call last):
  File "npu_compile_attempt.py", line 21, in <module>
    compiled = core.compile_model(model, "NPU")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "openvino/_ov_api.py", line 646, in compile_model
    super().compile_model(model, device_name, {} if config is None else config),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src\inference\src\cpp\core.cpp:113:
Exception from src\inference\src\dev\plugin.cpp:53:
Exception from src\plugins\intel_npu\src\plugin\src\plugin.cpp:879:
Exception from src\plugins\intel_npu\src\compiler_adapter\src\ze_graph_ext_wrappers.cpp:405:
L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004 - generic error code for invalid arguments . [NPU_VCL] Compiler returned msg:
Exception from src\core\src\partial_shape.cpp:266:
to_shape was called on a dynamic shape.

Issue submission checklist

  • I'm reporting an issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions