Skip to content

Misc. bug: Unable to use the OpenVINO NPU backend #23984

@fxzxmicah

Description

@fxzxmicah

Name and Version

version: 9444 (b9444)
built with GNU 16.1.1 for Linux x86_64

Operating systems

No response

Which llama.cpp modules do you know to be affected?

No response

Command line

GGML_OPENVINO_DEVICE=NPU llama-cli --predict 256 --ctx-size 4096 --device OPENVINO0 --model Qwen2.5-3B-Instruct-Q4_0.gguf

Problem description & steps to reproduce

Run the command above.

First Bad Commit

No response

Relevant log output

Logs
0.39.958.062 E GGML OpenVINO backend ov::Exception: Exception from src/inference/src/cpp/infer_request.cpp:224:
Check 'dst->get_element_type() == get_element_type()' failed at src/core/src/runtime/itensor.cpp:75:
Tensor element types are not equal. (src: f32 != dst: f16)


0.39.958.070 E graph_compute: ggml_backend_sched_graph_compute_async failed with error -1
0.39.958.071 E process_ubatch: failed to compute graph, compute status: -1
0.39.958.083 E llama_decode: failed to decode, ret = -3
0.39.965.952 E GGML OpenVINO backend ov::Exception: Exception from src/inference/src/cpp/infer_request.cpp:224:
Check 'dst->get_element_type() == get_element_type()' failed at src/core/src/runtime/itensor.cpp:75:
Tensor element types are not equal. (src: f32 != dst: f16)


0.39.965.954 E graph_compute: ggml_backend_sched_graph_compute_async failed with error -1
0.39.965.954 E process_ubatch: failed to compute graph, compute status: -1
0.39.965.959 E llama_decode: failed to decode, ret = -3
0.39.965.959 E common_context_can_seq_rm: llama_decode() failed: -3

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions