Replies: 1 comment 2 replies
-
|
note that The throughput per individual request is 6-7 tokens/s. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I deployed the bf16 DeepSeek-R1 model using 32 A800 GPUs with the following command:

However, the token usage seems consistently low (around 0.02) with a throughput of approximately 18 tokens/s. The GPU utilization remains at 100%, which suggests not just partial parameter activation per token.
Could you advise if there are specific hyperparameters I should configure to address this issue?
Beta Was this translation helpful? Give feedback.
All reactions