Super slow response on 2x Sparkle Arc Pro B70 Blower - 32GB #22707
martinfrandsen
started this conversation in
General
Replies: 1 comment
-
|
Do you have all the GPU drivers in your system installed? https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html https://dgpu-docs.intel.com/driver/client/overview.html Make sure ReBAR/above 4G are enabled in your bios. It would also be more helpful to debug if you could check through nvtop or any other monitoring software to see if the GPUs are being utilized correctly. If you try running it on the official Docker image after trying the above, what speed do you see? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there!
Im running Ubuntu Server 24.04 on 2x Sparkle Arc Pro B70 Blower - 32GB, Intel Core Ultra 7 270K CPU and 64GB Ram.
I have followed the guide (https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/SYCL.md#linux) and Im able to see the GPU's:
But when running the llama.cpp application on the "gemma-4-26B-A4B-it-UD-IQ2_XXS.gguf" model, im only getting 5tokens pr second!
I guessing thats not correct, so im hoping someone here could help me understand whats wrong in my setup..
Beta Was this translation helpful? Give feedback.
All reactions