Skip to content

Bug: In ik_llama.cpp, unexpected dot (".") prefix is often added to tool call parameters of K2 Thinking, while in the mainline llama.cpp it works fine #1078

@Lissanro

Description

@Lissanro

What happened?

This is an example of what happens in ik_llama.cpp, K2 Thinking trying to run commands with "." prefix:

.git checkout -f

Even if I ask the model to correct, it can't:

token	  9413: 'Let'
token	  1019: ' me'
token	  2284: ' run'
token	   276: ' the'
token	  6644: ' correct'
token	  5850: ' command'
token	  2932: ' without'
token	   276: ' the'
token	  8134: ' leading'
token	 21089: ' dot'
token	    25: ':'
token	163595: '<|tool_calls_section_begin|>'
token	163597: '<|tool_call_begin|>'
token	 41937: 'functions'
token	 20994: '.execute'
token	 20975: '_command'
token	    25: ':'
token	   920: '19'
token	163598: '<|tool_call_argument_begin|>'
token	  8264: '{"'
token	  9106: 'command'
token	  1289: '":'
token	  6082: '".'
token	 14284: 'git'
token	 33490: ' checkout'
token	   635: ' -'
token	    69: 'f'
token	  3923: '","'
token	 80816: 'cwd'
token	  1289: '":'
token	  2796: '""'
token	    92: '}'
token	163599: '<|tool_call_end|>'
token	163596: '<|tool_calls_section_end|>'
token	163586: '<|im_end|>'

In some cases, I was K2 Thinking in ik_llama.cpp type correct commands, for example here it successfully ran rm -rf llmcache_v2:

token	163595: '<|tool_calls_section_begin|>'
token	163597: '<|tool_call_begin|>'
token	 41937: 'functions'
token	 20994: '.execute'
token	 20975: '_command'
token	    25: ':'
token	    23: '8'
token	163598: '<|tool_call_argument_begin|>'
token	  8264: '{"'
token	  9106: 'command'
token	  1289: '":'
token	     1: '"'
token	 13119: 'rm'
token	   635: ' -'
token	 14373: 'rf'
token	 15503: ' ll'
token	 13347: 'mc'
token	  1960: 'ache'
token	  4231: '_v'
token	    17: '2'
token	  3923: '","'
token	 80816: 'cwd'
token	  1289: '":'
token	  2796: '""'
token	    92: '}'
token	163599: '<|tool_call_end|>'
token	163596: '<|tool_calls_section_end|>'
token	163586: '<|im_end|>'

But afterwards, it is typically goes back to adding the dot prefix to all prefix. For a while, I thought that this may be a model issue, but recently I was testing llama.cpp, and to my surprise the issue does not happen there, K2 Thinking seems to be reliably making correct tool calls.

It is worth mentioning that not just execute command tool calls are effected, but others too - however, like I mentioned, sometimes the model succeeds to correctly type the tool call.

This is llama.cpp command I tested with:

numactl --cpunodebind=0 --interleave=all /home/lissanro/pkgs/llama.cpp/build/bin/llama-server \
--model /mnt/neuro/models/Kimi-K2-Thinking-Q8_0-Q4_0.gguf \
--ctx-size 163840 --n-gpu-layers 62 --tensor-split 15,27,30,28 -ctk q8_0 -ctv q8_0 -b 4096 -ub 4096 \
-ot "blk\.(3)\.ffn_.*=CUDA0" \
-ot "blk\.(4)\.ffn_.*=CUDA1" \
-ot "blk\.(5)\.ffn_.*=CUDA2" \
-ot "blk\.(6)\.ffn_.*=CUDA3" \
-ot exps=CPU \
--threads 64 --host 0.0.0.0 --port 5000 \
--jinja --chat-template-file /home/lissanro/pkgs/llama.cpp/models/templates/Kimi-K2-Thinking.jinja --special

And this is my ik_llama.cpp command:

numactl --cpunodebind=0 --interleave=all /home/lissanro/pkgs/ik_llama.cpp/build/bin/llama-server \
--model /mnt/neuro/models/Kimi-K2-Thinking-Q8_0-Q4_0.gguf \
--ctx-size 163840 --n-gpu-layers 62 --tensor-split 25,22,28,25 -mla 3 -ctk q8_0 -amb 512 -b 4096 -ub 4096 \
-ot "blk\.(3)\.ffn_.*=CUDA0" \
-ot "blk\.(4)\.ffn_.*=CUDA1" \
-ot "blk\.(5)\.ffn_.*=CUDA2" \
-ot "blk\.(6)\.ffn_.*=CUDA3" \
-ot exps=CPU \
--threads 64 --host 0.0.0.0 --port 5000 \
--jinja --chat-template-file /home/lissanro/pkgs/ik_llama.cpp/models/templates/Kimi-K2-Thinking.jinja --special

I tried to compare ik_llama.cpp and llama.cpp chat templates for K2 Thinking, but they are identical:

diff -u \
/home/lissanro/pkgs/llama.cpp/models/templates/Kimi-K2-Thinking.jinja \
/home/lissanro/pkgs/ik_llama.cpp/models/templates/Kimi-K2-Thinking.jinja
(empty output)

The main issue is here, the wrong tool call:

token	163598: '<|tool_call_argument_begin|>'
token	  8264: '{"'
token	  9106: 'command'
token	  1289: '":'
token	  6082: '".'

Correct tool call:

token	163598: '<|tool_call_argument_begin|>'
token	  8264: '{"'
token	  9106: 'command'
token	  1289: '":'
token	     1: '"'

My understanding, the sequence of tokens always should be with token 1 after token 1289. I remember that in one bug reports here it was mentioned that ik_llama.cpp does not correctly force grammar, so maybe it is still the case?

An example of different incorrect tool call by ik_llama.cpp (the model wanted to check llmcache_v2 directory, but instead mistyped .cache_v2):

token	  9413: 'Let'
token	  1019: ' me'
token	  2598: ' check'
token	   276: ' the'
token	  7942: ' existing'
token	  1268: ' `'
token	   930: 'll'
token	 13347: 'mc'
token	  1960: 'ache'
token	  4231: '_v'
token	    17: '2'
token	    63: '`'
token	  9003: ' directory'
token	  7828: ' structure'
token	    25: ':'
token	163595: '<|tool_calls_section_begin|>'
token	163597: '<|tool_call_begin|>'
token	 41937: 'functions'
token	 14026: '.list'
token	 20350: '_files'
token	    25: ':'
token	  2466: '23'
token	163598: '<|tool_call_argument_begin|>'
token	  8264: '{"'
token	  4953: 'path'
token	  1289: '":'
token	  6082: '".'
token	 14466: 'cache'
token	  4231: '_v'
token	    17: '2'
token	  3923: '","'
token	 88997: 'recursive'
token	  1289: '":'
token	  4130: 'true'
token	    92: '}'
token	163599: '<|tool_call_end|>'
token	163596: '<|tool_calls_section_end|>'
token	163586: '<|im_end|>'

Notice how the issue occurs in this part:

token	  8264: '{"'
token	  4953: 'path'
token	  1289: '":'
token	  6082: '".'
token	 14466: 'cache'

It generated tokens: {"path":. instead of "path":" - basically, every time : is followed by . instead of " in a tool call, the model starts to misbehave.

But, many tool calls still can succeed, some types of tool calls less likely to trigger the issue. Here is a correct example of tool call ik_llama.cpp managed to make:

token	163595: '<|tool_calls_section_begin|>'
token	163597: '<|tool_call_begin|>'
token	 41937: 'functions'
token	  9189: '.write'
token	  4585: '_to'
token	  6101: '_file'
token	    25: ':'
token	    18: '3'
token	163598: '<|tool_call_argument_begin|>'
token	  8264: '{"'
token	  4953: 'path'
token	  1289: '":'
token	     1: '"'
token	   930: 'll'
token	 13347: 'mc'
token	  1960: 'ache'
token	  4231: '_v'
token	    17: '2'
token	 94246: '/__'
token	  9885: 'main'
token	 32394: '__.'
token	  8374: 'py'
token	   665: '",'
token	     1: '"'
token	  4204: 'content'
token	  7471: '":"'
...

One more correct tool call by ik_llama.cpp with file array:

oken	163595: '<|tool_calls_section_begin|>'
token	163597: '<|tool_call_begin|>'
token	 41937: 'functions'
token	  8827: '.read'
token	  6101: '_file'
token	    25: ':'
token	    15: '0'
token	163598: '<|tool_call_argument_begin|>'
token	  8264: '{"'
token	 12481: 'files'
token	  1289: '":'
token	 81103: '[{"'
token	  4953: 'path'
token	  7471: '":"'
token	  2113: 'ref'
token	   692: 'act'
token	  4715: 'ored'
token	 30247: '_project'
token	 71887: '_structure'
token	  6847: '.md'
token	 69622: '"},{"'
token	  4953: 'path'
token	  7471: '":"'
token	  3879: 'function'
token	  2700: '_h'
token	 24822: 'ierarchy'
token	  6847: '.md'
token	 69622: '"},{"'
...

I would appreciate any ideas how to make the tool calls more reliable in ik_llama.cpp or at least where to look to debug this further.

I am testing in Roo Code with the latest PR RooCodeInc/Roo-Code#10236 that enabled support for native K2 Thinking tool calls. The reason why I think the issue is with ik_llama.cpp, because it does not seem to happen in the mainline llama.cpp as far as I can tell.

Name and Version

The latest git

What operating system are you seeing the problem on?

Linux

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions