We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
No response
Currently, we didn't compile llamacpp with cuda accelerate. If we want to support use offload feature, we need to compile llamacpp with gpu label.
cuda
gpu
https://github.com/SkywardAI/llama.cpp/blob/a59f8fdc85e1119d470d8766e29617962549d993/examples/main/README.md?plain=1#L72
how many layer you want to run your model on GPU?