Skip to content

Node working well, just a comment re llama-cpp-python using CPU instead of GPU, and a Q on expected run duration. #16

@808charlie

Description

@808charlie

In part, I am offering observation should it help others. Additionally, I'd be grateful if you can comment on expected run duration?

In short, I was facing issues with llama-cpp-python using CPU not GPU in another LLM node, and your script provided with MiniCPM was perfect to fix the problem! I'm unsure, however, if I have your MiniCPM node running on GPU properly!

For info, the issue I had with another node is here, SeargeDP/ComfyUI_Searge_LLM#54

As I had your MiniCPM node, I found your script to install llama_cpp_python, which is superb as it helped me get llama-cpp-python to use CUDA.
https://github.com/1038lab/ComfyUI-MiniCPM/blob/main/llama_cpp_install/llama_cpp_install.py

The problem I found is that once llama-ccp-python is installed in an environment (perhaps by another node without flagging to use CUDA), any attempts to install llama-cpp-python or calls from requirements.txt are both seen as already be fulfilled. As such your script sees the GPU, you specify cmake.args="-DGGML_CUDA=on", but pip fails to updated llama-cpp-python as pip considers requirements already met.

BUT..

uninstalling llama-cpp-python first, results in your script calling pip and building a wheel which gets llama-cpp-python installed and correctly using cuda! Thank you.

In my case (ubuntu, venv, cu128, python 3.12, 9thGen I7, 5090), I did the following

cd ComfyUI
source ./venv/bin/activate
cd custom_nodes/ComfyUI-MiniCPM/llama_cpp_install/
pip uninstall llama-cpp-python
python3 llama_cpp_install.py 

For the Searge node I reference above, the llama-cpp-python reinstall cut run duration from 50s (I see all CPU cores being used) to 1.5s (GPU at 100%).

Sadly, this hasn't sped up runs with MiniCPM which remained around 28 seconds (I see only 1 CPU core used) and a fewer flashes at 20% on the GPU. I'm not sure I can tell what is being used, as it taxes nothing but takes its time.

I'd be keen to know if 30 seconds is an expected runtime with the specs I mention, and any pointers on resolution if not? Other than that, feel free to close this as resolved. Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions