feat: add llama.cpp OpenVINO backend for Linux#2085
Conversation
Adds support for the upstream ggml-org/llama.cpp OpenVINO backend on Linux.
OpenVINO enables inference on Intel CPUs, iGPUs, dGPUs, and NPUs via Intel's
optimization runtime.
- backend_versions.json: pin openvino to b9253 (same build as cpu/vulkan/metal)
- llamacpp_server.cpp: add is_llamacpp_openvino_backend() helper; handle
openvino in get_install_params() (Linux-only, ggml-org/llama.cpp release
asset llama-{ver}-bin-ubuntu-openvino-x64.tar.gz); enable context-shift
and LD_LIBRARY_PATH setup for OpenVINO like CUDA/Vulkan
- system_info.cpp: register llamacpp/openvino in RECIPE_DEFS for Linux x86_64
(between Vulkan and ROCm in preference order)
- defaults.json: add openvino_args and openvino_bin defaults to llamacpp section
- config_file.cpp: add LEMONADE_LLAMACPP_OPENVINO_ARGS and
LEMONADE_LLAMACPP_OPENVINO_BIN env variable mappings
https://claude.ai/code/session_01A3rK6yxjK9h7pe4ikrAvqj
There was a problem hiding this comment.
Pull request overview
This pull request adds a new llama.cpp OpenVINO backend option for Linux to the Lemonade server, including version pinning, backend selection support, and configuration defaults/env-var wiring so users can install and run the upstream OpenVINO release assets.
Changes:
- Registers
llamacpp/openvinoas a supported backend on Linux x86_64 and adds it into the backend preference ordering. - Extends the llama.cpp backend installer/runtime wiring to fetch the OpenVINO Linux asset and set
LD_LIBRARY_PATHappropriately. - Adds config defaults and environment-variable mappings for
openvino_argsandopenvino_bin, plus a backend version pin.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/cpp/server/system_info.cpp |
Adds llamacpp/openvino to the supported backend matrix for Linux x86_64 and places it in the preference order. |
src/cpp/server/config_file.cpp |
Adds env-var mappings for LEMONADE_LLAMACPP_OPENVINO_ARGS and LEMONADE_LLAMACPP_OPENVINO_BIN. |
src/cpp/server/backends/llamacpp_server.cpp |
Adds OpenVINO install asset selection, context-shift enabling, and LD_LIBRARY_PATH setup for the OpenVINO tarball layout. |
src/cpp/resources/defaults.json |
Introduces default llamacpp.openvino_args and llamacpp.openvino_bin values. |
src/cpp/resources/backend_versions.json |
Pins llamacpp.openvino to the same build tag as other upstream llama.cpp backends. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| {"LEMONADE_LLAMACPP_ROCM_ARGS", "llamacpp", "rocm_args"}, | ||
| {"LEMONADE_LLAMACPP_VULKAN_ARGS", "llamacpp", "vulkan_args"}, | ||
| {"LEMONADE_LLAMACPP_CPU_ARGS", "llamacpp", "cpu_args"}, | ||
| {"LEMONADE_LLAMACPP_OPENVINO_ARGS", "llamacpp", "openvino_args"}, |
There was a problem hiding this comment.
isn't this migration code? We didn't have oepnvino support before so how can you migrate?
There was a problem hiding this comment.
These aren't migration-only mappings — migrate_from_env() is called on every fresh install to bootstrap config.json from env vars. All backends use the same mechanism (see LEMONADE_LLAMACPP_VULKAN_ARGS, LEMONADE_LLAMACPP_CUDA_BIN, etc.). Adding OpenVINO entries here means users who configure via env vars get them picked up on first run, consistent with the existing pattern.
There was a problem hiding this comment.
I could have sworn there was a discussion somewhere about axing them.
There was a problem hiding this comment.
Please see #2106. I'm getting rid of cruft, don't add more.
- backend_versions.json: update openvino build to b9488 (matches first
upstream release with OpenVINO asset), add openvino.runtime_version=2026.0
to encode the OpenVINO runtime version embedded in the asset filename
- llamacpp_server.cpp: add get_openvino_runtime_version() helper (mirrors
get_therock_version()); use it to construct the correct asset filename
llama-{build}-bin-ubuntu-openvino-{runtime}-x64.tar.gz
- llamacpp_server.cpp: consolidate duplicate CUDA + OpenVINO LD_LIBRARY_PATH
blocks into a single combined branch
https://claude.ai/code/session_01A3rK6yxjK9h7pe4ikrAvqj
Switch the OpenVINO backend download source from ggml-org/llama.cpp to lemonade-sdk/llama.cpp, which bundles the OpenVINO runtime libs in the tarball so no system-wide OpenVINO install is required. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| "openvino": { | ||
| "runtime_version": "2026.0" |
There was a problem hiding this comment.
does the runtime need to get installed somehow?
Adds support for the upstream ggml-org/llama.cpp OpenVINO backend on Linux. OpenVINO enables inference on Intel CPUs, iGPUs, dGPUs, and NPUs via Intel's optimization runtime.
https://claude.ai/code/session_01A3rK6yxjK9h7pe4ikrAvqj