Node working well, just a comment re llama-cpp-python using CPU instead of GPU, and a Q on expected run duration.

In part, I am offering observation should it help others. Additionally, I'd be grateful if you can comment on expected run duration?

In short, I was facing issues with llama-cpp-python using CPU not GPU in another LLM node, and your script provided with MiniCPM was perfect to fix the problem! I'm unsure, however, if I have your MiniCPM node running on GPU properly!

For info, the issue I had with another node is here, https://github.com/SeargeDP/ComfyUI_Searge_LLM/issues/54

As I had your MiniCPM node, I found your script to install llama_cpp_python, which is superb as it helped me get llama-cpp-python to use CUDA.
https://github.com/1038lab/ComfyUI-MiniCPM/blob/main/llama_cpp_install/llama_cpp_install.py

The problem I found is that once llama-ccp-python is installed in an environment (perhaps by another node without flagging to use CUDA), any attempts to install llama-cpp-python or calls from requirements.txt are both seen as already be fulfilled. As such your script sees the GPU, you specify` cmake.args="-DGGML_CUDA=on"`, but pip fails to updated llama-cpp-python as pip considers requirements already met.

BUT..

uninstalling llama-cpp-python first, results in your script calling pip and building a wheel which gets llama-cpp-python installed and correctly using cuda! Thank you. 

In my case (ubuntu, venv, cu128, python 3.12, 9thGen I7, 5090), I did the following

```
cd ComfyUI
source ./venv/bin/activate
cd custom_nodes/ComfyUI-MiniCPM/llama_cpp_install/
pip uninstall llama-cpp-python
python3 llama_cpp_install.py 

```

For the Searge node I reference above, the llama-cpp-python reinstall cut run duration from 50s (I see all CPU cores being used) to 1.5s (GPU at 100%). 

Sadly, this hasn't sped up runs with MiniCPM which remained around 28 seconds (I see only 1 CPU core used) and a fewer flashes at 20% on the GPU. I'm not sure I can tell what is being used, as it taxes nothing but takes its time. 

I'd be keen to know if 30 seconds is an expected runtime with the specs I mention, and any pointers on resolution if not? Other than that, feel free to close this as resolved. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Node working well, just a comment re llama-cpp-python using CPU instead of GPU, and a Q on expected run duration. #16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Node working well, just a comment re llama-cpp-python using CPU instead of GPU, and a Q on expected run duration. #16

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions