Skip to content

[NPU][VPUX] LLVM ABORT in as_convolution pass — degenerate 0-channel shape for Qwen3-0.6B INT4 grouped quantization on Lunar Lake NPU 4000  #34450

@blairducrayoppat

Description

@blairducrayoppat

OpenVINO Version

2026.0.0-20965-c6d6a13a886-releases/2026/0

Operating System

Windows System

Device used for inference

NPU

Framework

None

Model used

Qwen/Qwen3-0.6B (28 layers, INT4 grouped quantization, exported to OV format for NPU)

Issue description

When constructing an [LLMPipeline] with a Qwen3-0.6B 28-layer INT4 model as the NPU draft device in OpenVINO GenAI heterogeneous speculative decoding (GPU target + NPU draft), the VPUX compiler aborts with

LLVM ERROR: Failed to infer result type(s)

during model compilation. The as_convolution optimization pass produces a degenerate tensor<1x0x1x1xf16> input shape for self_attn.v_proj in layer 0, which is irreconcilable with the filter shape tensor<1x8x1x1xf16>. The resulting IE.Convolution node fails MLIR type inference, triggering SIGABRT. The process exits immediately; the failure is not catchable via Python exception handling.

Issue #28171 is the closest existing open issue (same error class, Windows LNL NPU, still unresolved)

Tested precision: INT4 per-group (group_size=128) — this is the distinguishing factor vs. channel-wise INT4 which is listed as supported

Suggestion: Intel add explicit validation guard in as_convolution pass for 0-channel input tensors produced by fc_decomposed canonicalization of per-group INT4 weights

Environment
Field Value
Hardware Intel Core Ultra 7 258V (Lunar Lake, 8P cores, 8 logical)
NPU Intel AI Boost (integrated, Lunar Lake)
GPU Intel Arc 140V (Xe2, 16 GB shared LPDDR5X)
Memory 32 GB LPDDR5X-8533 unified
OS Windows 11 Pro, Build 26200
NPU Driver 32.0.100.4514 (dated 2025-12-17)
GPU Driver 32.0.101.6987
OpenVINO 2026.0.0-20965-c6d6a13a886-releases/2026/0
OpenVINO GenAI 2026.0.0.0-2820-dab5b993a38
Python 3.x, Windows venv
Model Qwen/Qwen3-0.6B (28 layers, INT4 grouped quantization, exported to OV format for NPU)
Context Heterogeneous speculative decoding: GPU target (Qwen3-14B INT4) + NPU draft

Environment.md

Step-by-step reproduction

  1. Export Qwen3-0.6B to NPU format using optimum-intel or [ov.save_model]:

The model used was exported to [openvino-int4-npu/] format (INT4 grouped quantization, Qwen3-0.6B 28L). The export itself completes without error.

  1. Construct a heterogeneous speculative decoding pipeline:
import openvino_genai as ov_genai
from openvino_genai import LLMPipeline, SchedulerConfig

TARGET_PATH = "models/qwen3-14b/openvino-int4-gpu/"
DRAFT_NPU_PATH = "models/qwen3-0.6b/openvino-int4-npu/"

scheduler = SchedulerConfig()
scheduler.cache_size = 3  # GB

pipeline = LLMPipeline(
    TARGET_PATH,
    "GPU",
    draft_model=ov_genai.draft_model(DRAFT_NPU_PATH, "NPU"),
    scheduler_config=scheduler,
)
  1. Observed result:

Process aborts with the following output before Python returns from the [LLMPipeline] constructor. No Python exception is raised. Exit code: 1.

Relevant log output

[ERROR] 00:12:40.147 [vpux-compiler] Got Diagnostic at loc(fused<{name =
"__module.model.layers.0.self_attn.v_proj/ov_ext::linear/MatMul",
type = "MatMul"}>["__module.model.layers.0.self_attn.v_proj/ov_ext::linear/MatMul",
"fc_decomposed", "matmul_0", "as_convolution"]) :
Channels count of input tensor shape and filter shape must be the same: 0 != 8

loc(fused<{name = "__module.model.layers.0.self_attn.v_proj/ov_ext::linear/MatMul",
type = "MatMul"}>["__module.model.layers.0.self_attn.v_proj/ov_ext::linear/
MatMul", "fc_decomposed", "matmul_0", "as_convolution"]):
error: Channels count of input tensor shape and filter shape must be the same: 0 != 8

LLVM ERROR: Failed to infer result type(s):
"IE.Convolution"(...) {} : (tensor<1x0x1x1xf16>, tensor<1x8x1x1xf16>) -> ( ??? )

Issue submission checklist

  • I'm reporting an issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions