feat: add Linux ARM64 CPU and Vulkan llamacpp backend support#2081
feat: add Linux ARM64 CPU and Vulkan llamacpp backend support#2081kenvandine wants to merge 18 commits into
Conversation
- Download arm64 binaries (cpu and vulkan) from ggml-org/llama.cpp releases when compiled for aarch64 Linux - Extend RECIPE_DEFS to allow arm64 CPU family for llamacpp cpu, vulkan, and system backends so they appear as installable on ARM64 - Fix get_device_dict() catch block to always set the cpu family via compile-time macros; without this, an exception in get_cpu_device() left the family field missing, causing backend matching to fail even after the RECIPE_DEFS change Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…port Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds two new jobs to cpp_server_build_test_release.yml: - build-lemonade-linux-arm64: compiles lemond and lemonade on the GitHub-provided ubuntu-24.04-arm runner, confirming the ARM64 code path builds cleanly on every PR. - test-cli-endpoints-linux-arm64: runs the cli, endpoints, ollama, and streaming-errors test suites against the built ARM64 binary. Omits llamacpp-system (no system llama-server), env-vars (requires .deb path), and Vulkan inference (no GPU on GitHub-hosted ARM64 runners). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fl0rianr
left a comment
There was a problem hiding this comment.
Thanks for bringing this in @kenvandine! I guessed it might be welcomed if I take a look here as well...
Non-blocking: after the ARM64 server startup issue is fixed, it might be useful to add one small ARM64-specific smoke test for the core change in this PR.
The PR changes the llama.cpp asset names to bin-ubuntu-arm64.tar.gz / bin-ubuntu-vulkan-arm64.tar.gz, but this matrix currently only runs the generic CLI/endpoint/Ollama tests. At least lemonade backends install llamacpp:cpu should be testable on the ARM64 runner and would cover the new CPU asset path directly. Vulkan may be harder without GPU access.
Replace the generic "family" JSON key in device dictionaries with specific names that communicate what the field represents: - CPU devices: "cpu_isa" (e.g. "x86_64", "arm64") - GPU devices (AMD, NVIDIA, Metal): "gpu_isa" (e.g. "gfx1151", "sm_89", "metal") - NPU devices: "npu_isa" (e.g. "XDNA2") Addresses PR feedback from r3349563880. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The ARM64 test job was missing the server startup step, causing all tests to fail with "Server is not running on port 13305". Add a "Start lemond server" step that sets XDG_RUNTIME_DIR, launches ./build/lemond in the background, and polls /live for up to 60 seconds before timing out with a log dump. Addresses PR feedback from r3349545074. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
superm1
left a comment
There was a problem hiding this comment.
Doesn't the llama.cpp uprev job need to be changed too?
|
No change needed to the uprev job. The ARM64 and x86 Linux builds come from the same upstream llama.cpp release tag — The one gap is that the validate job only tests on Windows self-hosted runners and won't exercise the Linux ARM64 download paths, but that requires an ARM64 self-hosted runner and is out of scope here. |
Ensures the server process inherits HF_HOME so model/cache paths resolve correctly during tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
From my side this is ready for merge, if the super fast tests are running successfully. |
sd-cpp has no ARM64 Linux binary (cpu backend is x86_64 only), so image generation tests fail with 500 on the ARM64 CI runner. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sd-cpp has no ARM64 Linux binary, so fall through to llamacpp the same way the test already does on macOS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
RECIPE_DEFSinsystem_info.cppto allowarm64CPU family forllamacppcpu,vulkan, andsystembackends, so they appear as supported/installable on ARM64 Linuxaarch64:cpu:llama-{version}-bin-ubuntu-arm64.tar.gzvulkan:llama-{version}-bin-ubuntu-vulkan-arm64.tar.gzget_device_dict()catch block to always set the CPUfamilyfield via compile-time macros — without this, an exception inget_cpu_device()leftfamilymissing from the JSON, causing backend matching to fail even after theRECIPE_DEFSchange (manifested as "Requires ARM64 processors CPU" despite being on an ARM64 system)docs/guide/configuration/llamacpp.mdandREADME.mdto document ARM64 Linux support forcpuandvulkanbackendsValidated against llama.cpp release assets at
b9253andb9482— both shipbin-ubuntu-arm64.tar.gzandbin-ubuntu-vulkan-arm64.tar.gz. No version bump tobackend_versions.jsonneeded.On ARM64 Linux (e.g., Qualcomm X Elite),
vulkanis preferred overcpuby the existingRECIPE_DEFSpreference order.Test plan
lemondon an ARM64 Linux system and runlemonade recipes—llamacpp:cpuandllamacpp:vulkanshould show asinstallablelemonade backends install llamacpp:vulkandownloadsbin-ubuntu-vulkan-arm64.tar.gzand runs inferencelemonade backends install llamacpp:cpudownloadsbin-ubuntu-arm64.tar.gzand runs inferencex64variants)🤖 Generated with Claude Code