Replies: 3 comments 15 replies
-
|
Perhaps unrelated, but I tested 35B on a w7800 48gb and on a rtx pro 4000 with (both with --cpu-moe), and tg is about 1.5x faster on the rtx, while prompt processing is 5-6x faster. Even if the model is fully loaded into the w7800s vram, I get ~300 t/s pp and ~40 t/s tg. That means pp is still 4x times slower than on the rtx pro 4000 WITH OFFLOADING... I am using the official docker compose file. |
Beta Was this translation helpful? Give feedback.
-
|
FYI, if you have ANY ofloading, none of your numbers are representative. Let us know what are the numbers with with everything inside of VRAM. @fighter3005 That's interesting bit of information. Can you elaborate on how you've set those up etc ? I'm now pondering going 7900xtx OR used 3090 (few of them to build an local ai rig - don't want to waste more $ on cloud ai) |
Beta Was this translation helpful? Give feedback.
-
|
I am getting 230 t/s pp and 32 t/s decode with llama.cpp b89+, for qwen3.5 122b a10b UD-Q6_K_XL. Flash attention on, ctk bf16, ctv bf16, ctx 96k, batch 256, ubatch 64, parallel 1. Software stack: 7.2.0, ubuntu 24.06, hwe 6.17, patched the amd dkms from the linux 7.0 one to resolve card pinned at 90w when mmproj loaded even when idle. Hardware stack: 4x 9700, gigabyte mc62-g40, 5955wx, 128gb ddr4 2133 mhz. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I'm experiencing performance issues of the new Qwen3.5-122B on my 7900 XTX. I've tested ROCM, Vulkan. Different Quants, the performance is just bad. I'm getting only 13t/s, while I see the GPU only utilized with 30-40% load. The CPU with several cores unused. I'm not sure what is happening. I see other people mentioning getting 20+t/s on a 4070 Super. My 7900 XTX needs far less offloading with it's VRAM. I'm really not sure what is happening, the system is usually not performing that bad. With the Qwen3-Coder-Next I get 30t/s.
Would be nice if I could get some other input to compare, maybe there are even other people with a 7900 XTX.
Beta Was this translation helpful? Give feedback.
All reactions