Skip to content

Slow processing of follow-up prompt #54

Open
@woheller69

Description

@woheller69

In a multi-turn conversation I see that the combination of llama-cpp-python and llama-cpp-agent is much slower on the second prompt than the python bindings of gpt4all. See the 2 screenshots below. The evaluation of the first prompt is faster, probably due to the recent speed improvements for prompt processing which have not yet been adopted in gpt4all. When I reply to that first answer from the AI the second reply of gpt4all comes much faster than the first whereas llama-cpp-python/llama-cpp-agent are even slower than on the first prompt. My setup is CPU only.
Do you have an idea why this is the case? Do they handle memory in a more efficient way?

Llama-3-8b-instruct Q8
Prompt processing
round        gpt4all        llama-cpp-python/agent
1            12.03 s              7.17 s
2             3.73 s              8.46 s

gpt4all
llama-cpp-agent

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions