Skip to content

feat: add llama.cpp OpenVINO backend for Linux#2085

Draft
kenvandine wants to merge 7 commits into
lemonade-sdk:mainfrom
kenvandine:claude/llama-cpp-openvino-linux-0n3HI
Draft

feat: add llama.cpp OpenVINO backend for Linux#2085
kenvandine wants to merge 7 commits into
lemonade-sdk:mainfrom
kenvandine:claude/llama-cpp-openvino-linux-0n3HI

Conversation

@kenvandine
Copy link
Copy Markdown
Member

Adds support for the upstream ggml-org/llama.cpp OpenVINO backend on Linux. OpenVINO enables inference on Intel CPUs, iGPUs, dGPUs, and NPUs via Intel's optimization runtime.

  • backend_versions.json: pin openvino to b9253 (same build as cpu/vulkan/metal)
  • llamacpp_server.cpp: add is_llamacpp_openvino_backend() helper; handle openvino in get_install_params() (Linux-only, ggml-org/llama.cpp release asset llama-{ver}-bin-ubuntu-openvino-x64.tar.gz); enable context-shift and LD_LIBRARY_PATH setup for OpenVINO like CUDA/Vulkan
  • system_info.cpp: register llamacpp/openvino in RECIPE_DEFS for Linux x86_64 (between Vulkan and ROCm in preference order)
  • defaults.json: add openvino_args and openvino_bin defaults to llamacpp section
  • config_file.cpp: add LEMONADE_LLAMACPP_OPENVINO_ARGS and LEMONADE_LLAMACPP_OPENVINO_BIN env variable mappings

https://claude.ai/code/session_01A3rK6yxjK9h7pe4ikrAvqj

claude and others added 2 commits June 2, 2026 23:23
Adds support for the upstream ggml-org/llama.cpp OpenVINO backend on Linux.
OpenVINO enables inference on Intel CPUs, iGPUs, dGPUs, and NPUs via Intel's
optimization runtime.

- backend_versions.json: pin openvino to b9253 (same build as cpu/vulkan/metal)
- llamacpp_server.cpp: add is_llamacpp_openvino_backend() helper; handle
  openvino in get_install_params() (Linux-only, ggml-org/llama.cpp release
  asset llama-{ver}-bin-ubuntu-openvino-x64.tar.gz); enable context-shift
  and LD_LIBRARY_PATH setup for OpenVINO like CUDA/Vulkan
- system_info.cpp: register llamacpp/openvino in RECIPE_DEFS for Linux x86_64
  (between Vulkan and ROCm in preference order)
- defaults.json: add openvino_args and openvino_bin defaults to llamacpp section
- config_file.cpp: add LEMONADE_LLAMACPP_OPENVINO_ARGS and
  LEMONADE_LLAMACPP_OPENVINO_BIN env variable mappings

https://claude.ai/code/session_01A3rK6yxjK9h7pe4ikrAvqj
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a new llama.cpp OpenVINO backend option for Linux to the Lemonade server, including version pinning, backend selection support, and configuration defaults/env-var wiring so users can install and run the upstream OpenVINO release assets.

Changes:

  • Registers llamacpp/openvino as a supported backend on Linux x86_64 and adds it into the backend preference ordering.
  • Extends the llama.cpp backend installer/runtime wiring to fetch the OpenVINO Linux asset and set LD_LIBRARY_PATH appropriately.
  • Adds config defaults and environment-variable mappings for openvino_args and openvino_bin, plus a backend version pin.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/cpp/server/system_info.cpp Adds llamacpp/openvino to the supported backend matrix for Linux x86_64 and places it in the preference order.
src/cpp/server/config_file.cpp Adds env-var mappings for LEMONADE_LLAMACPP_OPENVINO_ARGS and LEMONADE_LLAMACPP_OPENVINO_BIN.
src/cpp/server/backends/llamacpp_server.cpp Adds OpenVINO install asset selection, context-shift enabling, and LD_LIBRARY_PATH setup for the OpenVINO tarball layout.
src/cpp/resources/defaults.json Introduces default llamacpp.openvino_args and llamacpp.openvino_bin values.
src/cpp/resources/backend_versions.json Pins llamacpp.openvino to the same build tag as other upstream llama.cpp backends.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/cpp/server/backends/llamacpp_server.cpp Outdated
Comment thread src/cpp/server/backends/llamacpp_server.cpp
{"LEMONADE_LLAMACPP_ROCM_ARGS", "llamacpp", "rocm_args"},
{"LEMONADE_LLAMACPP_VULKAN_ARGS", "llamacpp", "vulkan_args"},
{"LEMONADE_LLAMACPP_CPU_ARGS", "llamacpp", "cpu_args"},
{"LEMONADE_LLAMACPP_OPENVINO_ARGS", "llamacpp", "openvino_args"},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this migration code? We didn't have oepnvino support before so how can you migrate?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't migration-only mappings — migrate_from_env() is called on every fresh install to bootstrap config.json from env vars. All backends use the same mechanism (see LEMONADE_LLAMACPP_VULKAN_ARGS, LEMONADE_LLAMACPP_CUDA_BIN, etc.). Adding OpenVINO entries here means users who configure via env vars get them picked up on first run, consistent with the existing pattern.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have sworn there was a discussion somewhere about axing them.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfowers comments please

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #2106. I'm getting rid of cruft, don't add more.

kenvandine and others added 3 commits June 3, 2026 17:19
- backend_versions.json: update openvino build to b9488 (matches first
  upstream release with OpenVINO asset), add openvino.runtime_version=2026.0
  to encode the OpenVINO runtime version embedded in the asset filename
- llamacpp_server.cpp: add get_openvino_runtime_version() helper (mirrors
  get_therock_version()); use it to construct the correct asset filename
  llama-{build}-bin-ubuntu-openvino-{runtime}-x64.tar.gz
- llamacpp_server.cpp: consolidate duplicate CUDA + OpenVINO LD_LIBRARY_PATH
  blocks into a single combined branch

https://claude.ai/code/session_01A3rK6yxjK9h7pe4ikrAvqj
Switch the OpenVINO backend download source from ggml-org/llama.cpp to
lemonade-sdk/llama.cpp, which bundles the OpenVINO runtime libs in the
tarball so no system-wide OpenVINO install is required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread src/cpp/server/system_info.cpp
Comment on lines +12 to +13
"openvino": {
"runtime_version": "2026.0"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the runtime need to get installed somehow?

@github-actions github-actions Bot added engine::llamacpp llama.cpp backend (LlamaCppServer); GPU/CPU LLM inference (Vulkan, ROCm, Metal) runtime::cpu CPU-only execution path enhancement New feature or request labels Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

engine::llamacpp llama.cpp backend (LlamaCppServer); GPU/CPU LLM inference (Vulkan, ROCm, Metal) enhancement New feature or request runtime::cpu CPU-only execution path

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants