strange text duplication from llama-server to llama-cpp-agent

I am getting occasional duplicated text in responses using llama-cpp-agent to talk to a llama-server on a remote host. This does *not* seem to be a token repetition issue that might be solved with repeat-penalty. This seems to be a disagreement between client and server about when a data "chunk" is complete. It looks like this (Q: is the query sent to the server, A: is the answer):

Q:What is the tallest mountain in Europe? Be brief.
A:Mount Elbrus, located in the Caucas Caucasus range, Russia Russia, is the tallest mountain in Europe, with a a height of 5,642 meters (18,510 feet).

Note the duplication of "Caucas Caucasus" and "Russia Russia," in the response!

Looking at verbose output on the server side (llama-server -v) you can see that the server is *actually* sending " Caucas" in one message, followed by " Caucasus", and then later " Russia" immediately followed by " Russia," with a comma after it:

```
data stream, to_send: data: {"index":0,"content":"\n\n","tokens":[271],"stop":false,"id_slot":-1,"tokens_predicted":1,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":"Mount","tokens":[16683],"stop":false,"id_slot":-1,"tokens_predicted":2,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":" El","tokens":[4072],"stop":false,"id_slot":-1,"tokens_predicted":3,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":"br","tokens":[1347],"stop":false,"id_slot":-1,"tokens_predicted":4,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":"us","tokens":[355],"stop":false,"id_slot":-1,"tokens_predicted":5,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":",","tokens":[11],"stop":false,"id_slot":-1,"tokens_predicted":6,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":" located","tokens":[7559],"stop":false,"id_slot":-1,"tokens_predicted":7,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":" in","tokens":[304],"stop":false,"id_slot":-1,"tokens_predicted":8,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":" the","tokens":[279],"stop":false,"id_slot":-1,"tokens_predicted":9,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":" Caucas","tokens":[60532],"stop":false,"id_slot":-1,"tokens_predicted":10,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":" Caucasus","tokens":[355],"stop":false,"id_slot":-1,"tokens_predicted":11,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":" range","tokens":[2134],"stop":false,"id_slot":-1,"tokens_predicted":12,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":",","tokens":[11],"stop":false,"id_slot":-1,"tokens_predicted":13,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":" Russia","tokens":[8524],"stop":false,"id_slot":-1,"tokens_predicted":14,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":" Russia,","tokens":[11],"stop":false,"id_slot":-1,"tokens_predicted":15,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":" is","tokens":[374],"stop":false,"id_slot":-1,"tokens_predicted":16,"tokens_evaluated":31}
data stream, to_send: data: {"index":0,"content":" the","tokens":[279],"stop":false,"id_slot":-1,"tokens_predicted":17,"tokens_evaluated":31}
...
```

It is as if the server expects the client to know it should not emit the first " Russia" because it is superseded by the more complete next transmission " Russia,".

The above test is with the agent established using:

`agent = LlamaCppAgent(provider, predefined_messages_formatter_type=MessagesFormatterType.LLAMA_3)`

The llama-server is indeed using a llama3.2 model, so I think that is the correct FormatterType.

However, if instead I set up the agent not specifying any MessagesFormatterType, then I do not see the duplications in text coming from the server! (But there are other problems as you might expect, like <|im_end|> appearing in the response text, because there is not agreement between client and server on the end of message indication, presumably.) Surprisingly, with the (incorrect) default FormatterType, the server does not e.g. send " Caucas" followed by " Caucasus". It sends " Caucas" then "us". It is *not* the case that the client is treating the response differently--the server does not send duplicate data with the default FormatterType. So, there must be something different in the setup of the two chats that prevents the server from sending these duplications in this second case. I have looked a bit at the chat setup in the server logs and I have some ideas but if anyone knows what is going on here or how to fix it, please save me some time!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

strange text duplication from llama-server to llama-cpp-agent #86

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

strange text duplication from llama-server to llama-cpp-agent #86

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions