Super slow response on 2x Sparkle Arc Pro B70 Blower - 32GB #22707

martinfrandsen · 2026-05-05T10:12:20Z

martinfrandsen
May 5, 2026

Hi there!
Im running Ubuntu Server 24.04 on 2x Sparkle Arc Pro B70 Blower - 32GB, Intel Core Ultra 7 270K CPU and 64GB Ram.
I have followed the guide (https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/SYCL.md#linux) and Im able to see the GPU's:

./build/bin/llama-ls-sycl-device
Found 2 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Graphics [0xe223]|   20.2|    256|    1024|   32| 34242M|        1.14.37435+12|
| 1| [level_zero:gpu:1]|                Intel Graphics [0xe223]|   20.2|    256|    1024|   32| 34242M|        1.14.37435+12|
SYCL Optimization Feature:
|ID|        Device Type|Reorder|
|--|-------------------|-------|
| 0| [level_zero:gpu:0]|      Y|
| 1| [level_zero:gpu:1]|      Y|

But when running the llama.cpp application on the "gemma-4-26B-A4B-it-UD-IQ2_XXS.gguf" model, im only getting 5tokens pr second!

I guessing thats not correct, so im hoping someone here could help me understand whats wrong in my setup..

Isn't my setup supported?
Should I update to Ubuntu Server 26.04?
Is it better to run it in docker than direct on ubuntu server?
Should I use another OS?
????

sniperwhg · 2026-05-13T04:46:13Z

sniperwhg
May 13, 2026

Do you have all the GPU drivers in your system installed?

https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html

https://dgpu-docs.intel.com/driver/client/overview.html

Make sure ReBAR/above 4G are enabled in your bios.

It would also be more helpful to debug if you could check through nvtop or any other monitoring software to see if the GPUs are being utilized correctly.

If you try running it on the official Docker image after trying the above, what speed do you see?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Super slow response on 2x Sparkle Arc Pro B70 Blower - 32GB #22707

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Super slow response on 2x Sparkle Arc Pro B70 Blower - 32GB #22707

Uh oh!

martinfrandsen May 5, 2026

Replies: 1 comment

Uh oh!

sniperwhg May 13, 2026

martinfrandsen
May 5, 2026

sniperwhg
May 13, 2026