Skip to content

How to install the latest version with GPU support #2012

Open
@shigabeev

Description

@shigabeev

Hey, I've been struggling for a month to install the latest version with CUDA. It was a nightmare.

So here is the guide how to do that.

tldr docker syntax:

RUN apt-get update && apt-get upgrade -y \
    && apt-get install -y build-essential \
    ocl-icd-opencl-dev opencl-headers clinfo \
    libclblast-dev libopenblas-dev \
    && mkdir -p /etc/OpenCL/vendors \
    && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd \
    && apt-get clean

RUN pip install uv
RUN uv init .
RUN export CC=/usr/bin/gcc CXX=/usr/bin/g++
RUN export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH
RUN CMAKE_ARGS="-DGGML_CUDA=on \
            -DCMAKE_CUDA_ARCHITECTURES=75 \
            -DLLAMA_BUILD_EXAMPLES=OFF \
            -DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 \
uv pip install --system --upgrade --force-reinstall llama-cpp-python==0.3.8 \
--index-url https://pypi.org/simple \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 \
--index-strategy unsafe-best-match

Explanation:
Installation for CPU is easy as a cake.
Installation from source with GPU support is slow and labour intensive, so the best way is to install using provided wheels.
Github doesn't serve the last release in wheel. Latest was 0.3.8 while github has 0.3.4. It's relatively new but doesn't support gemma3.

First we need to provide paths to gcc and g++ compilers. Somehow this is a dealbreaker.

RUN export CC=/usr/bin/gcc CXX=/usr/bin/g++
RUN export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH

Linux dependencies:

    apt-get install -y build-essential \
    ocl-icd-opencl-dev opencl-headers clinfo \
    libclblast-dev libopenblas-dev \
    && mkdir -p /etc/OpenCL/vendors \
    && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd \

This one shortens the traceback in case anything fails. And it will fail likely.

 -DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 \

LLAMA_CUBLAS is obsolete, so we need to replace it with:

DGGML_CUDA=on

UV somehow manages to install this while pip can't.

uv pip install --system --upgrade --force-reinstall llama-cpp-python==0.3.8 \
--index-url https://pypi.org/simple \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 \
--index-strategy unsafe-best-match

Here we need to provide both --index-url https://pypi.org/simple and --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 because otherwise either 0.3.8 either cuda support wouldn't be available to pip. Replace cu122 with your cuda version. --index-strategy unsafe-best-match is also required otherwise it didn't build.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions