How to install the latest version with GPU support

Hey, I've been struggling for a month to install the latest version with CUDA. It was a nightmare. 

So here is the guide how to do that.

tldr docker syntax:
```
RUN apt-get update && apt-get upgrade -y \
    && apt-get install -y build-essential \
    ocl-icd-opencl-dev opencl-headers clinfo \
    libclblast-dev libopenblas-dev \
    && mkdir -p /etc/OpenCL/vendors \
    && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd \
    && apt-get clean

RUN pip install uv
RUN uv init .
RUN export CC=/usr/bin/gcc CXX=/usr/bin/g++
RUN export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH
RUN CMAKE_ARGS="-DGGML_CUDA=on \
            -DCMAKE_CUDA_ARCHITECTURES=75 \
            -DLLAMA_BUILD_EXAMPLES=OFF \
            -DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 \
uv pip install --system --upgrade --force-reinstall llama-cpp-python==0.3.8 \
--index-url https://pypi.org/simple \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 \
--index-strategy unsafe-best-match
```

Explanation:
Installation for CPU is easy as a cake.
Installation from source with GPU support is slow and labour intensive, so the best way is to install using provided wheels.
Github doesn't serve the last release in wheel. Latest was 0.3.8 while github has 0.3.4. It's relatively new but doesn't support gemma3. 

First we need to provide paths to gcc and g++ compilers. Somehow this is a dealbreaker.
```
RUN export CC=/usr/bin/gcc CXX=/usr/bin/g++
RUN export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH
```

Linux dependencies:
```
    apt-get install -y build-essential \
    ocl-icd-opencl-dev opencl-headers clinfo \
    libclblast-dev libopenblas-dev \
    && mkdir -p /etc/OpenCL/vendors \
    && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd \
```

This one shortens the traceback in case anything fails. And it will fail likely.
```
 -DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 \
```

LLAMA_CUBLAS is obsolete, so we need to replace it with:
```
DGGML_CUDA=on
```

UV somehow manages to install this while pip can't. 
```
uv pip install --system --upgrade --force-reinstall llama-cpp-python==0.3.8 \
--index-url https://pypi.org/simple \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 \
--index-strategy unsafe-best-match
```
Here we need to provide both `--index-url https://pypi.org/simple` and `--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122` because otherwise either 0.3.8 either cuda support wouldn't be available to pip. Replace `cu122` with your cuda version. `--index-strategy unsafe-best-match` is also required otherwise it didn't build.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to install the latest version with GPU support #2012

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to install the latest version with GPU support #2012

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions