-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
OpenVINO Version
2026.0.0-20965-c6d6a13a886-releases/2026/0
Operating System
Windows System
Device used for inference
NPU
Framework
None
Model used
Qwen/Qwen3-0.6B
Issue description
ov.Core().compile_model(model, "NPU") fails with a RuntimeError when compiling
a Qwen3-0.6B INT4 OpenVINO IR model on NPU (Intel NPU 4000, Lunar Lake). The VPUX
compiler encounters a Gather node with dynamic input dimensions (upper bounds set to
INT64_MAX) and raises a "non broadcastable dimensions" diagnostic, followed by a
to_shape was called on a dynamic shape exception.
The identical model IR compiles and runs correctly on both GPU and CPU.
Analysis
The model IR exported by optimum-intel uses dynamic dimensions for batch and sequence
length. The upper bounds for these dimensions contain INT64_MAX (9223372036854775807),
which indicates they were not resolved to concrete values during export.
The VPUX compiler encounters these unresolved dynamic dimensions at multiple nodes:
Convert_1(embedding input)Gather(embedding lookup) — first fatal diagnosticPower,ReduceMean,Multiply(layer norm computations)
The Gather node triggers the critical error: the VPUX compiler attempts to broadcast
dimensions 9223372036854775807 and -9223372036854775808 (signed overflow of INT64_MAX),
which fails. The compiler then calls to_shape() on the resulting dynamic PartialShape,
which is not supported and raises the exception.
Key source locations from the exception chain:
src\core\src\partial_shape.cpp:266—to_shape()called on dynamic shapesrc\plugins\intel_npu\src\compiler_adapter\src\ze_graph_ext_wrappers.cpp:405— L0 graph creation failuresrc\plugins\intel_npu\src\plugin\src\plugin.cpp:879— NPU plugin propagation
Cross-Device Comparison
| Device | Result |
|---|---|
GPU (compile_model(model, "GPU")) |
PASS — compiles and runs correctly |
CPU (compile_model(model, "CPU")) |
PASS — compiles and runs correctly |
NPU (compile_model(model, "NPU")) |
FAIL — RuntimeError: to_shape was called on a dynamic shape |
The GPU and CPU plugins handle the dynamic dimensions correctly. The NPU plugin does not.
Environment
| Component | Version |
|---|---|
| OpenVINO | 2026.0.0-20965-c6d6a13a886-releases/2026/0 |
| OpenVINO GenAI | 2026.0.0.0-2820-dab5b993a38 |
| optimum-intel | 1.27.0 |
| nncf | 3.0.0 |
| transformers | 4.51.3 |
| OS | Windows 11 Build 26200 |
| Hardware | Core Ultra 7 258V (Lunar Lake), Arc 140V (Xe2), NPU 4000 |
| NPU driver | 32.0.100.4514 |
| GPU driver | 32.0.101.6987 |
Model Details
| Property | Value |
|---|---|
| Model | Qwen/Qwen3-0.6B |
| Format | OpenVINO IR (exported via optimum-intel 1.27.0) |
| Quantization | INT4, per-group (group_size=128), asymmetric |
| Model size on disk | 367.26 MB |
| Inputs | 4 (dynamic batch + sequence dimensions) |
| Outputs | 1 |
| SHA256 (openvino_model.bin) | 9d25652b603f65c5a507ef9c4d35c285bbf94e116b707aa20973cff57c2226fd |
| SHA256 (openvino_model.xml) | c568dab243f546ff9f259fdbd1de25091a001949fee197faa1cdd3e784ba7895 |
Related Issues
Related issue: This was discovered during investigation of #34450 (a separate
as_convolution LLVM ABORT affecting the same model via the heterogeneous LLMPipeline
path). The two bugs are distinct — this one occurs earlier in the VPUX compiler pipeline
and surfaces as a catchable RuntimeError, whereas #34450 terminates the process with
an uncatchable SIGABRT.
- [Bug]: Compile Model: Upper bounds were not specified, got the default value - '9223372036854775807' #32466 — Same
INT64_MAXupper bounds error (Upper bounds were not specified, got the default value - '9223372036854775807') on NPU with SenseVoice (ONNX). Still open, assigned to@dmatveev. Different model and export path, same VPUX compiler failure. - [Build]: Dynamic Input Issue on NPU with GNN Inference #26375 — Same
to_shape was called on a dynamic shapeerror on NPU with GNN model (dynamic batch). Closed as stale.@YuChern-Intelconfirmed "only static shapes are supported on NPU." - Examples of notebooks/pixart cannot convert pixart model due to
to_shape was called on a dynamic shape.#26357 — Sameto_shapeerror on NPU with PixArt model (dynamic input dimensions). Closed as stale. Workaround suggested: reshape to static inputs. - [Bug]: When i use Ultra9 NPU to run models, it report: get_shape was called on a descriptor::Tensor with dynamic shape #24619 —
get_shape was called on a descriptor::Tensor with dynamic shapevariant on NPU (HuggingFace classification model). Closed.@avitialconfirmed "no dynamism support on NPU in the driver." - [NPU][VPUX] LLVM ABORT in as_convolution pass — degenerate 0-channel shape for Qwen3-0.6B INT4 grouped quantization on Lunar Lake NPU 4000 #34450 — Separate crash (LLVM ABORT in
as_convolutionpass) affecting the same Qwen3-0.6B model via the heterogeneousLLMPipelinepath. Different failure mode: [NPU][VPUX] LLVM ABORT in as_convolution pass — degenerate 0-channel shape for Qwen3-0.6B INT4 grouped quantization on Lunar Lake NPU 4000 #34450 is an uncatchable SIGABRT deeper in the VPUX compiler; this issue is an earlier, catchableRuntimeErroron the directcompile_model("NPU")path.
Step-by-step reproduction
1. Export the model (one-time):
optimum-cli export openvino \
--model Qwen/Qwen3-0.6B \
--weight-format int4 \
--group-size 128 \
--ratio 1.0 \
<output_dir>2. Attempt NPU compilation:
import openvino as ov
core = ov.Core()
model = core.read_model("<output_dir>/openvino_model.xml")
print(f"Model inputs: {len(model.inputs)}, outputs: {len(model.outputs)}")
compiled = core.compile_model(model, "NPU") # fails hereExpected Behavior
The model should compile for NPU, consistent with GPU and CPU behavior.
Actual Behavior
The VPUX compiler emits multiple [ERROR] diagnostics about unspecified upper bounds
on nodes with dynamic dimensions, then fails with a RuntimeError.
Relevant log output
**Full stdout:**
OpenVINO: 2026.0.0-20965-c6d6a13a886-releases/2026/0
Available devices: ['CPU', 'GPU', 'NPU']
Reading model...
Model read: 4 inputs, 1 outputs
Compiling for NPU...
[ERROR] 10:33:48.885 [IE::FrontEnd::importNetwork] Upper bounds are not specified for node '__module.model.embed_tokens/ov_ext::embedding/Convert_1' (type 'Convert'): input '0' bounds are '[9223372036854775807, 9223372036854775807]'
[ERROR] 10:33:48.885 [IE::FrontEnd::importNetwork] Upper bounds are not specified for node '__module.model.embed_tokens/ov_ext::embedding/Gather' (type 'Gather'): input '1' bounds are '[9223372036854775807, 9223372036854775807]'
[ERROR] 10:33:48.885 [IE::FrontEnd::importNetwork] Upper bounds are not specified for node '__module.model.layers.0.input_layernorm/aten::pow/Power' (type 'Power'): input '0' bounds are '[9223372036854775807, 9223372036854775807, 1024]'
[ERROR] 10:33:48.885 [IE::FrontEnd::importNetwork] Upper bounds are not specified for node '__module.model.layers.0.input_layernorm/aten::mean/ReduceMean' (type 'ReduceMean'): input '0' bounds are '[9223372036854775807, 9223372036854775807, 1024]'
[ERROR] 10:33:48.885 [vpux-compiler] Got Diagnostic at loc(fused<{name = "__module.model.embed_tokens/ov_ext::embedding/Gather", type = "Gather"}>["__module.model.embed_tokens/ov_ext::embedding/Gather"]) : Got non broadcastable dimensions pair : '9223372036854775807' and -9223372036854775808'
[ERROR] 10:33:48.886 [IE::FrontEnd::importNetwork] Upper bounds are not specified for node '__module.model.layers.0.input_layernorm/aten::mul/Multiply' (type 'Multiply'): input '0' bounds are '[9223372036854775807, 9223372036854775807, 1024]'
Python exception: Exception from src\inference\src\cpp\core.cpp:113:
Exception from src\inference\src\dev\plugin.cpp:53:
Exception from src\plugins\intel_npu\src\plugin\src\plugin.cpp:879:
Exception from src\plugins\intel_npu\src\compiler_adapter\src\ze_graph_ext_wrappers.cpp:405:
L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004 - generic error code for invalid arguments . [NPU_VCL] Compiler returned msg:
Exception from src\core\src\partial_shape.cpp:266:
to_shape was called on a dynamic shape.
**Full stderr:**
loc(fused<{name = "__module.model.embed_tokens/ov_ext::embedding/Gather", type = "Gather"}>["__module.model.embed_tokens/ov_ext::embedding/Gather"]): error: Got non broadcastable dimensions pair : '9223372036854775807' and -9223372036854775808'
Traceback (most recent call last):
File "npu_compile_attempt.py", line 21, in <module>
compiled = core.compile_model(model, "NPU")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "openvino/_ov_api.py", line 646, in compile_model
super().compile_model(model, device_name, {} if config is None else config),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src\inference\src\cpp\core.cpp:113:
Exception from src\inference\src\dev\plugin.cpp:53:
Exception from src\plugins\intel_npu\src\plugin\src\plugin.cpp:879:
Exception from src\plugins\intel_npu\src\compiler_adapter\src\ze_graph_ext_wrappers.cpp:405:
L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004 - generic error code for invalid arguments . [NPU_VCL] Compiler returned msg:
Exception from src\core\src\partial_shape.cpp:266:
to_shape was called on a dynamic shape.Issue submission checklist
- I'm reporting an issue. It's not a question.
- I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- There is reproducer code and related data files such as images, videos, models, etc.