Skip to content

Why ingest and then chat are using only 6 cores? #610

Answered by johnbrisbin
pikor69 asked this question in Q&A
Discussion options

You must be logged in to vote

@sime2408,

For privateGPT.py you can set the thread count as high as you like using this parameter to LllamaCpp: Add n_threads=psutil.cpu_count(logical=False) the False value gets you the number of physical cores, a True value gets the number of virtual threads. This will use up all the threads and push CPU usage to 100% (on winders). For all the redlining of the CPU, I am unsure if it is really much faster. I compared the Llama print times after the query and saw little difference. Maybe it doesn't show up there.

On ingestion, it can get a lot better. Just enabling the GPU in the embedding LLM makes it about 7 times faster. Overlapping and threading the various actions can bring that num…

Replies: 3 comments 3 replies

Comment options

You must be logged in to vote
2 replies
@pikor69
Comment options

@farrukhms
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@johnbrisbin
Comment options

Answer selected by pikor69
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants