Performance of llama.cpp on Intel GPU with SYCL backend #23313
Replies: 10 comments 21 replies
-
|
compiled with cmake -B build-sycl -DGGML_SYCL=ON -DGGML_SYCL_F16=ON -DGGML_SYCL_TARGET=INTEL -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_FLAGS="-march=znver4" -DCMAKE_CXX_FLAGS="-march=znver4" -DCMAKE_BUILD_TYPE=Release && cmake --build build-sycl --config Release -j 16 single b70 dual b70: |
Beta Was this translation helpful? Give feedback.
-
|
If instead compiling and using with f16=off: cmake -B build-sycl -DGGML_SYCL=ON -DGGML_SYCL_F16=OFF -DGGML_SYCL_TARGET=INTEL -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_FLAGS="-march=znver4" -DCMAKE_CXX_FLAGS="-march=znver4" -DCMAKE_BUILD_TYPE=Release && cmake --build build-sycl --config Release -j 16 Single B70: Dual B70: |
Beta Was this translation helpful? Give feedback.
-
|
And with a much more interesting model, namely Qwen 3.6 27B: q4 and q8: |
Beta Was this translation helpful? Give feedback.
-
|
Ooft. A770 16GB, i5 14600k, current cachyOS. fp16: fp32: Very weird, compared with the one in the table - much better prefill, half the decode performance. |
Beta Was this translation helpful? Give feedback.
-
|
B580, AMD Ryzen 7 5700X3D, Ubuntu 25.10 built with fp16 in |
Beta Was this translation helpful? Give feedback.
-
|
Intel Arc Pro B50, Intel i7-8700 32GB RAM build: c0c7e14 (9298)
build: 2f6c815 (9397)
Command line arguments: Build options: cmake .. -B build -DGGML_VULKAN=1 -DGGML_RPC=ON nothing else changed between these runs, I tested my old version, ran "git pull", built it and retested |
Beta Was this translation helpful? Give feedback.
-
|
I hope it helps. 255H, ARC 140T, 32GB RAM
build: d4c8e2c (9442) |
Beta Was this translation helpful? Give feedback.
-
|
~/llama.cpp$ cmake -B build/ReleaseOV -G Ninja -DCMAKE_BUILD_TYPE=Release -DGGML_OPENVINO=ON :~/llama.cpp$ GGML_OPENVINO_STATEFUL_EXECUTION=1
specs https://www.asrockind.com/en-gb/NUC%20BOX-358H |
Beta Was this translation helpful? Give feedback.
-
📊 Intel Panther Lake Xe3 iGPU (12 EU) Benchmark Matrix: OpenVINO vs. Vulkan vs. SYCLI have completed a thorough benchmarking sweep across all three major acceleration backends available in Environment
Models Tested
📈 Performance Summary Matrix
🛠️ Deep-Dive Analysis1. The OpenVINO Qwen3 MoE CrashOpenVINO panics and drops a core dump immediately when attempting to initialize the new
2. SYCL vs. Vulkan on the 80B MoE ArchitectureBoth SYCL and Vulkan bypass the memory allocation crash cleanly, proving their robust memory routing handling over large UMA allocations:
📋 Raw Build & Execution Logs1. Qwen3 80B MoE — OpenVINO Crash Log |
Beta Was this translation helpful? Give feedback.
-
|
HW:ryzen5 5600X, DDR4-3600 128GB, ARC B570 FP32 FP16 |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Purpose
It's used to share the performance data on Intel GPU with SYCL backend.
The performance data is only used as reference, since we don't double check the data.
It can not be used as any commercial purpose.
Rule
Encourage to test with default setting (environment variables).
If you want to update the data with special building or running setting, please create a new table.
Create/update the tables directly following the format.
Insert new record, instead of update it for same keys; Sort the records by col1, col2, col3.
Add your comments in the latest for more discussion.
Don't add table to compare with other hardware, framework or backend.
Please run 1+ times and update with the stable data.
Performance data on Intel GPU
Default setting
Build:
Run:
Data:
FP16
t/s
t/s
- Medium
32GB
- Medium
32GB
64GB
24.04.4
64GB
24.04.4
5700X3D
25.10
5700X3D
25.10
Beta Was this translation helpful? Give feedback.
All reactions