Description
Hey, I've been struggling for a month to install the latest version with CUDA. It was a nightmare.
So here is the guide how to do that.
tldr docker syntax:
RUN apt-get update && apt-get upgrade -y \
&& apt-get install -y build-essential \
ocl-icd-opencl-dev opencl-headers clinfo \
libclblast-dev libopenblas-dev \
&& mkdir -p /etc/OpenCL/vendors \
&& echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd \
&& apt-get clean
RUN pip install uv
RUN uv init .
RUN export CC=/usr/bin/gcc CXX=/usr/bin/g++
RUN export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH
RUN CMAKE_ARGS="-DGGML_CUDA=on \
-DCMAKE_CUDA_ARCHITECTURES=75 \
-DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 \
uv pip install --system --upgrade --force-reinstall llama-cpp-python==0.3.8 \
--index-url https://pypi.org/simple \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 \
--index-strategy unsafe-best-match
Explanation:
Installation for CPU is easy as a cake.
Installation from source with GPU support is slow and labour intensive, so the best way is to install using provided wheels.
Github doesn't serve the last release in wheel. Latest was 0.3.8 while github has 0.3.4. It's relatively new but doesn't support gemma3.
First we need to provide paths to gcc and g++ compilers. Somehow this is a dealbreaker.
RUN export CC=/usr/bin/gcc CXX=/usr/bin/g++
RUN export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH
Linux dependencies:
apt-get install -y build-essential \
ocl-icd-opencl-dev opencl-headers clinfo \
libclblast-dev libopenblas-dev \
&& mkdir -p /etc/OpenCL/vendors \
&& echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd \
This one shortens the traceback in case anything fails. And it will fail likely.
-DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 \
LLAMA_CUBLAS is obsolete, so we need to replace it with:
DGGML_CUDA=on
UV somehow manages to install this while pip can't.
uv pip install --system --upgrade --force-reinstall llama-cpp-python==0.3.8 \
--index-url https://pypi.org/simple \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 \
--index-strategy unsafe-best-match
Here we need to provide both --index-url https://pypi.org/simple
and --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122
because otherwise either 0.3.8 either cuda support wouldn't be available to pip. Replace cu122
with your cuda version. --index-strategy unsafe-best-match
is also required otherwise it didn't build.