Description
LLAMA Compile + Vulkan
-
cmake -B build -DGGML_VULKAN=ON
-
cmake -B build -DGGML_VULKAN=ON -DGGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=native
-
i see the following:
(llama-env) PS C:\llama-dev\llama.cpp> cmake -B build -DGGML_VULKAN=ON
-- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.26200.
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: AMD64
-- CMAKE_GENERATOR_PLATFORM:
-- Including CPU backend
-- x86 detected
-- Adding CPU backend variant ggml-cpu:
-- Vulkan found
-- GL_KHR_cooperative_matrix supported by glslc
-- GL_NV_cooperative_matrix2 supported by glslc
-- GL_EXT_integer_dot_product supported by glslc
-- Including Vulkan backend
- cmake --build . --config Release
- i see under \bin\Release\ggml-vulkan.dll
- (llama-env) PS C:\llama-dev\llama.cpp\build\bin\Release> .\llama-server.exe -m "C:\AI LLMS\gemma-3-12b-it-Q4_K_M.gguf"
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon RX 580 Series (AMD proprietary driver) | uma: 0 | fp16: 0 | warp size: 64 | shared memory: 32768 | int dot: 0 | matrix cores: none
error: invalid argument: LLMS\gemma-3-12b-it-Q4_K_M.gguf
is it only possible to use llama-server.exe?
and not as normal like
import llama_cpp
from llama_cpp import Llama