[NPU][VPUX] LLVM ABORT in as_convolution pass — degenerate 0-channel shape for Qwen3-0.6B INT4 grouped quantization on Lunar Lake NPU 4000 

### OpenVINO Version

2026.0.0-20965-c6d6a13a886-releases/2026/0

### Operating System

Windows System

### Device used for inference

NPU

### Framework

None

### Model used

Qwen/Qwen3-0.6B (28 layers, INT4 grouped quantization, exported to OV format for NPU)

### Issue description

When constructing an [LLMPipeline] with a Qwen3-0.6B 28-layer INT4 model as the NPU draft device in OpenVINO GenAI heterogeneous speculative decoding (GPU target + NPU draft), the VPUX compiler aborts with 

> LLVM ERROR: Failed to infer result type(s)

 during model compilation. The `as_convolution` optimization pass produces a degenerate `tensor<1x0x1x1xf16>` input shape for `self_attn.v_proj` in layer 0, which is irreconcilable with the filter shape `tensor<1x8x1x1xf16>`. The resulting `IE.Convolution` node fails MLIR type inference, triggering `SIGABRT`. The process exits immediately; the failure is not catchable via Python exception handling.

Issue #28171 is the closest existing open issue (same error class, Windows LNL NPU, still unresolved)

Tested precision: INT4 per-group (group_size=128) — this is the distinguishing factor vs. channel-wise INT4 which is listed as supported

Suggestion: Intel add explicit validation guard in as_convolution pass for 0-channel input tensors produced by fc_decomposed canonicalization of per-group INT4 weights

Environment	
Field	Value
Hardware	Intel Core Ultra 7 258V (Lunar Lake, 8P cores, 8 logical)
NPU	Intel AI Boost (integrated, Lunar Lake)
GPU	Intel Arc 140V (Xe2, 16 GB shared LPDDR5X)
Memory	32 GB LPDDR5X-8533 unified
OS	Windows 11 Pro, Build 26200
NPU Driver	32.0.100.4514 (dated 2025-12-17)
GPU Driver	32.0.101.6987
OpenVINO	2026.0.0-20965-c6d6a13a886-releases/2026/0
OpenVINO GenAI	2026.0.0.0-2820-dab5b993a38
Python	3.x, Windows venv
Model	Qwen/Qwen3-0.6B (28 layers, INT4 grouped quantization, exported to OV format for NPU)
Context	Heterogeneous speculative decoding: GPU target (Qwen3-14B INT4) + NPU draft

[Environment.md](https://github.com/user-attachments/files/25694207/Environment.md)

### Step-by-step reproduction

1. Export Qwen3-0.6B to NPU format using optimum-intel or [ov.save_model]:

The model used was exported to [openvino-int4-npu/] format (INT4 grouped quantization, Qwen3-0.6B 28L). The export itself completes without error.

2. Construct a heterogeneous speculative decoding pipeline:
```
import openvino_genai as ov_genai
from openvino_genai import LLMPipeline, SchedulerConfig

TARGET_PATH = "models/qwen3-14b/openvino-int4-gpu/"
DRAFT_NPU_PATH = "models/qwen3-0.6b/openvino-int4-npu/"

scheduler = SchedulerConfig()
scheduler.cache_size = 3  # GB

pipeline = LLMPipeline(
    TARGET_PATH,
    "GPU",
    draft_model=ov_genai.draft_model(DRAFT_NPU_PATH, "NPU"),
    scheduler_config=scheduler,
)
```

3. Observed result:

Process aborts with the following output before Python returns from the [LLMPipeline] constructor. No Python exception is raised. Exit code: 1.

### Relevant log output

```shell
[ERROR] 00:12:40.147 [vpux-compiler] Got Diagnostic at loc(fused<{name =
"__module.model.layers.0.self_attn.v_proj/ov_ext::linear/MatMul",
type = "MatMul"}>["__module.model.layers.0.self_attn.v_proj/ov_ext::linear/MatMul",
"fc_decomposed", "matmul_0", "as_convolution"]) :
Channels count of input tensor shape and filter shape must be the same: 0 != 8

loc(fused<{name = "__module.model.layers.0.self_attn.v_proj/ov_ext::linear/MatMul",
type = "MatMul"}>["__module.model.layers.0.self_attn.v_proj/ov_ext::linear/
MatMul", "fc_decomposed", "matmul_0", "as_convolution"]):
error: Channels count of input tensor shape and filter shape must be the same: 0 != 8

LLVM ERROR: Failed to infer result type(s):
"IE.Convolution"(...) {} : (tensor<1x0x1x1xf16>, tensor<1x8x1x1xf16>) -> ( ??? )
```

### Issue submission checklist

- [x] I'm reporting an issue. It's not a question.
- [x] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- [x] There is reproducer code and related data files such as images, videos, models, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPU][VPUX] LLVM ABORT in as_convolution pass — degenerate 0-channel shape for Qwen3-0.6B INT4 grouped quantization on Lunar Lake NPU 4000 #34450

OpenVINO Version

Operating System

Device used for inference

Framework

Model used

Issue description

Step-by-step reproduction

Relevant log output

Issue submission checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[NPU][VPUX] LLVM ABORT in as_convolution pass — degenerate 0-channel shape for Qwen3-0.6B INT4 grouped quantization on Lunar Lake NPU 4000 #34450

Description

OpenVINO Version

Operating System

Device used for inference

Framework

Model used

Issue description

Step-by-step reproduction

Relevant log output

Issue submission checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions