Why is gpu+cpu slower than cpu only ? #1186
Answered
by
BetaDoggo
franklin050187
asked this question in
Q&A
-
|
I did run vicuna 13b model on gpu + cpu with 0.5 tokens / s while CPU only get me 4 tokens / s. GPU : 3060TI Also answer from GPU + CPU was way more accurate and coherent than CPU only, but having tried 13b model, I dont see me going back to 7b models... I guess I will wait or buy a 3060 (12gb) |
Beta Was this translation helpful? Give feedback.
Answered by
BetaDoggo
Apr 14, 2023
Replies: 1 comment
-
|
The "auto-devices" option is a bit misleading. The option is actually only offloading the model to system memory. Only the gpu is being used for the actual compute in this mode. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
franklin050187
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The "auto-devices" option is a bit misleading. The option is actually only offloading the model to system memory. Only the gpu is being used for the actual compute in this mode.