Slow processing of follow-up prompt

In a multi-turn conversation I see that the combination of llama-cpp-python and llama-cpp-agent is much slower on the second prompt than the python bindings of gpt4all. See the 2 screenshots below. The evaluation of the first prompt is faster, probably due to the recent speed improvements for prompt processing which have not yet been adopted in gpt4all. When I reply to that first answer from the AI the second reply of gpt4all comes much faster than the first whereas llama-cpp-python/llama-cpp-agent are even slower than on the first prompt. My setup is CPU only.
Do you have an idea why this is the case? Do they handle memory in a more efficient way?

```
Llama-3-8b-instruct Q8
Prompt processing
round        gpt4all        llama-cpp-python/agent
1            12.03 s              7.17 s
2             3.73 s              8.46 s
```
![gpt4all](https://github.com/Maximilian-Winter/llama-cpp-agent/assets/68678880/934b4e44-b777-4315-8952-ebeae4ccc272)
![llama-cpp-agent](https://github.com/Maximilian-Winter/llama-cpp-agent/assets/68678880/c13015f7-1d66-4288-b1cc-d66ff5bdbe25)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slow processing of follow-up prompt #54

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Slow processing of follow-up prompt #54

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions