Inconsistent Arguments Between llama-bench and llama-cli / llama-server #23979
-
|
Perhaps I'm missing something, is there a reason that I can't run llama bench with the same arguments I use with the CLI and server? I had to strip out everything after -ngl and then it failed to load because it wasn't using both GPUs. Everything works fine with CLI and server! -ngl is clearly listed as supported. It's been a while, 6+ months, since I've used the benchmark but I don't recall having to customize the arguments. Thanks. OS: Ubuntu 24.04 error: invalid parameter for argument: -ngl Usage |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
|
Hi @d-shehu ,
There are three separate issues in your command. 1.
|
Beta Was this translation helpful? Give feedback.
-
|
There are also seems to be a bug where llama-bench doesn't work with q8 for kv cache. This prevents benchmarking it with more realistic context sizes i.e. greater than 16K. 32K and above causes it to spill over into system ram and start churning the CPUs despite there being plenty of free VRAM even with FP16 kv cache. Nor does it seem to work with speculative decoding. Only trivial cases like this seem to work. |
Beta Was this translation helpful? Give feedback.
Hi @d-shehu ,
llama-benchdoesn't share the argument parser used byllama-cliandllama-server. It's a standalone benchmarking tool with its own (intentionally smaller) option set, defined separately intools/llama-bench/llama-bench.cpprather than through the sharedcommonargs. So nothing is broken on your end — several of your flags either don't exist in bench or use different conventions, and the command just needs to be translated.There are three separate issues in your command.
1.
-ngl allmust be a numberThe flag is supported, but on b9222 bench's
-ngl / --n-gpu-layerstakes an integer<n>(default99), not theauto/allkeyword that cli/server accept. The errorinvalid paramete…