Skip to content

feat: add Linux ARM64 CPU and Vulkan llamacpp backend support#2081

Open
kenvandine wants to merge 18 commits into
mainfrom
kenvandine/arm64
Open

feat: add Linux ARM64 CPU and Vulkan llamacpp backend support#2081
kenvandine wants to merge 18 commits into
mainfrom
kenvandine/arm64

Conversation

@kenvandine
Copy link
Copy Markdown
Member

@kenvandine kenvandine commented Jun 2, 2026

Summary

  • Extend RECIPE_DEFS in system_info.cpp to allow arm64 CPU family for llamacpp cpu, vulkan, and system backends, so they appear as supported/installable on ARM64 Linux
  • Download ARM64-specific binaries from upstream llama.cpp releases when compiled for aarch64:
    • cpu: llama-{version}-bin-ubuntu-arm64.tar.gz
    • vulkan: llama-{version}-bin-ubuntu-vulkan-arm64.tar.gz
  • Fix get_device_dict() catch block to always set the CPU family field via compile-time macros — without this, an exception in get_cpu_device() left family missing from the JSON, causing backend matching to fail even after the RECIPE_DEFS change (manifested as "Requires ARM64 processors CPU" despite being on an ARM64 system)
  • Update docs/guide/configuration/llamacpp.md and README.md to document ARM64 Linux support for cpu and vulkan backends

Validated against llama.cpp release assets at b9253 and b9482 — both ship bin-ubuntu-arm64.tar.gz and bin-ubuntu-vulkan-arm64.tar.gz. No version bump to backend_versions.json needed.

On ARM64 Linux (e.g., Qualcomm X Elite), vulkan is preferred over cpu by the existing RECIPE_DEFS preference order.

Test plan

  • Restart lemond on an ARM64 Linux system and run lemonade recipesllamacpp:cpu and llamacpp:vulkan should show as installable
  • lemonade backends install llamacpp:vulkan downloads bin-ubuntu-vulkan-arm64.tar.gz and runs inference
  • lemonade backends install llamacpp:cpu downloads bin-ubuntu-arm64.tar.gz and runs inference
  • Existing x86_64 Linux behavior unchanged (still downloads x64 variants)
  • macOS and Windows builds unaffected

🤖 Generated with Claude Code

kenvandine and others added 3 commits June 2, 2026 15:23
- Download arm64 binaries (cpu and vulkan) from ggml-org/llama.cpp
  releases when compiled for aarch64 Linux
- Extend RECIPE_DEFS to allow arm64 CPU family for llamacpp cpu,
  vulkan, and system backends so they appear as installable on ARM64
- Fix get_device_dict() catch block to always set the cpu family via
  compile-time macros; without this, an exception in get_cpu_device()
  left the family field missing, causing backend matching to fail even
  after the RECIPE_DEFS change

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…port

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds two new jobs to cpp_server_build_test_release.yml:

- build-lemonade-linux-arm64: compiles lemond and lemonade on the
  GitHub-provided ubuntu-24.04-arm runner, confirming the ARM64 code
  path builds cleanly on every PR.

- test-cli-endpoints-linux-arm64: runs the cli, endpoints, ollama, and
  streaming-errors test suites against the built ARM64 binary. Omits
  llamacpp-system (no system llama-server), env-vars (requires .deb
  path), and Vulkan inference (no GPU on GitHub-hosted ARM64 runners).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kenvandine kenvandine requested a review from jeremyfowers June 2, 2026 20:37
Comment thread .github/workflows/cpp_server_build_test_release.yml
Comment thread src/cpp/server/system_info.cpp Outdated
Copy link
Copy Markdown
Collaborator

@fl0rianr fl0rianr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this in @kenvandine! I guessed it might be welcomed if I take a look here as well...

Non-blocking: after the ARM64 server startup issue is fixed, it might be useful to add one small ARM64-specific smoke test for the core change in this PR.

The PR changes the llama.cpp asset names to bin-ubuntu-arm64.tar.gz / bin-ubuntu-vulkan-arm64.tar.gz, but this matrix currently only runs the generic CLI/endpoint/Ollama tests. At least lemonade backends install llamacpp:cpu should be testable on the ARM64 runner and would cover the new CPU asset path directly. Vulkan may be harder without GPU access.

Comment thread .github/workflows/cpp_server_build_test_release.yml
kenvandine and others added 3 commits June 3, 2026 13:56
Replace the generic "family" JSON key in device dictionaries with
specific names that communicate what the field represents:
  - CPU devices: "cpu_isa" (e.g. "x86_64", "arm64")
  - GPU devices (AMD, NVIDIA, Metal): "gpu_isa" (e.g. "gfx1151", "sm_89", "metal")
  - NPU devices: "npu_isa" (e.g. "XDNA2")

Addresses PR feedback from r3349563880.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The ARM64 test job was missing the server startup step, causing all
tests to fail with "Server is not running on port 13305". Add a
"Start lemond server" step that sets XDG_RUNTIME_DIR, launches
./build/lemond in the background, and polls /live for up to 60
seconds before timing out with a log dump.

Addresses PR feedback from r3349545074.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@superm1 superm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the llama.cpp uprev job need to be changed too?

@kenvandine
Copy link
Copy Markdown
Member Author

No change needed to the uprev job. The ARM64 and x86 Linux builds come from the same upstream llama.cpp release tag — llamacpp_server.cpp selects the right archive name at compile time (bin-ubuntu-arm64.tar.gz vs bin-ubuntu-x64.tar.gz), but both pull from the same release. So when the uprev job bumps llamacpp.cpu and llamacpp.vulkan in backend_versions.json, the updated version applies to both architectures automatically.

The one gap is that the validate job only tests on Windows self-hosted runners and won't exercise the Linux ARM64 download paths, but that requires an ARM64 self-hosted runner and is out of scope here.

Comment thread .github/workflows/cpp_server_build_test_release.yml Outdated
Ensures the server process inherits HF_HOME so model/cache paths
resolve correctly during tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@fl0rianr
Copy link
Copy Markdown
Collaborator

fl0rianr commented Jun 3, 2026

From my side this is ready for merge, if the super fast tests are running successfully.

@kenvandine kenvandine requested a review from jeremyfowers June 4, 2026 00:53
kenvandine and others added 3 commits June 4, 2026 15:26
sd-cpp has no ARM64 Linux binary (cpu backend is x86_64 only), so
image generation tests fail with 500 on the ARM64 CI runner.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sd-cpp has no ARM64 Linux binary, so fall through to llamacpp the same
way the test already does on macOS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jeremyfowers jeremyfowers added this to the Lemonade v10.7 milestone Jun 5, 2026
@github-actions github-actions Bot added engine::llamacpp llama.cpp backend (LlamaCppServer); GPU/CPU LLM inference (Vulkan, ROCm, Metal) runtime::vulkan Vulkan runtime / GPU backend enhancement New feature or request documentation Improvements or additions to documentation labels Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation engine::llamacpp llama.cpp backend (LlamaCppServer); GPU/CPU LLM inference (Vulkan, ROCm, Metal) enhancement New feature or request runtime::vulkan Vulkan runtime / GPU backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants