Skip to content

Commit 9d16d85

Browse files
authored
Update README.md
1 parent 589d52a commit 9d16d85

File tree

1 file changed

+83
-12
lines changed

1 file changed

+83
-12
lines changed

Llama-Cpp-Dockerfile/README.md

Lines changed: 83 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,30 @@
11
# llama-cpp-python with Intel GPU (SYCL) Docker
22

3-
This repository provides a Dockerfile to build a containerized environment for running [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) with Intel GPU acceleration using SYCL (oneAPI). The image is based on Ubuntu 24.04 and includes all necessary Intel GPU drivers, oneAPI Base Toolkit, and Python dependencies for efficient LLM inference.
3+
This repository provides a Dockerfile to build a containerized environment for running [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) with Intel GPU acceleration using SYCL (oneAPI). The image is based on Ubuntu 25.10 and includes all necessary Intel GPU drivers, oneAPI Base Toolkit, and Python dependencies for efficient LLM inference.
4+
5+
## Table of Contents
6+
7+
- [Features](#features)
8+
- [Build the docker image](#build-the-docker-image)
9+
- [Run the container](#run-the-container)
10+
- [Default container](#default-container)
11+
- [Custom model or arguments](#custom-model-or-arguments)
12+
- [Mount a local directory and run a model from it](#mount-a-local-directory-and-run-a-model-from-it)
13+
- [Run bash shell in the container](#run-bash-shell-in-the-container)
14+
- [Notes](#notes)
15+
- [License](#license)
16+
17+
---
418

519
## Features
6-
- **Intel GPU support**: Installs Intel GPU drivers and oneAPI for SYCL acceleration.
20+
21+
- **Intel GPU support**: Installs [Intel GPU drivers](https://dgpu-docs.intel.com/driver/client/overview.html) and [oneAPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/overview.html) for SYCL acceleration.
722
- **llama-cpp-python**: Builds and installs with Intel SYCL support for fast inference.
823
- **Python environment**: Uses [uv](https://github.com/astral-sh/uv) for fast Python dependency management.
924
- **Ready-to-use server**: Runs the llama-cpp-python server by default.
1025

26+
---
27+
1128
## Build the Docker Image
1229

1330
Clone this repository and build the Docker image:
@@ -17,43 +34,97 @@ Clone this repository and build the Docker image:
1734
docker build -t llamacpp-intel-sycl:latest .
1835
```
1936

37+
---
38+
2039
## Run the Container
2140

41+
### Default container
42+
2243
To run the server with a default HuggingFace model (Qwen/Qwen2-0.5B-Instruct-GGUF):
2344

2445
```sh
25-
docker run --rm --device /dev/dri --gpus all -p 8000:8000 llamacpp-intel-sycl:latest
46+
docker run -it --rm \
47+
--device /dev/dri \
48+
-p 8000:8000 \
49+
llamacpp-intel-sycl:latest
2650
```
2751

2852
- `--device /dev/dri` exposes the Intel GPU to the container.
29-
- `--gpus all` (if using Docker with GPU support) ensures all GPUs are available.
3053
- `-p 8000:8000` maps the server port.
3154

32-
### Custom Model or Arguments
55+
### Custom model or arguments
3356

3457
The default model and arguments can be overridden as needed:
3558

3659
```sh
37-
docker run --rm --device /dev/dri --gpus all -p 8000:8000 llamacpp-intel-sycl:latest \
38-
--hf_model_repo_id <hf-repo> --model <model-file>
60+
docker run -it --rm \
61+
--device /dev/dri \
62+
-p 8000:8000 \
63+
llamacpp-intel-sycl:latest \
64+
--n_gpu_layers -1 \
65+
--hf_model_repo_id <hf-repo> --model <model-file>
3966
```
4067

41-
## Mount a Local Directory and Run a Model from It
68+
- `--n_gpu_layers -1` All layers of the model are offloaded to your GPU.
69+
- Replace `--hf_model_repo_id <hf-repo>` with the HuggingFace model repo id to use.
70+
- Replace `--model <model-file>` with the HuggingFace model file to use.
71+
72+
### Mount a local directory and run a model from it
4273

4374
To use a model file stored on the host machine, mount the directory containing the model into the container and specify the path to the model file. For example, if the model is located in `/path/to/models` on the host:
4475

4576
```sh
46-
docker run --rm --device /dev/dri --gpus all -p 8000:8000 \
47-
-v /path/to/models:/models \
48-
llamacpp-intel-sycl:latest \
49-
--model /models/<model-file>
77+
docker run -it --rm \
78+
--device /dev/dri \
79+
-p 8000:8000 \
80+
-v /path/to/models:/models \
81+
llamacpp-intel-sycl:latest \
82+
--n_gpu_layers -1 \
83+
--model /models/<model-file>
5084
```
5185

5286
- `-v /path/to/models:/models` mounts the local directory into the container at `/models`.
5387
- Replace `<model-file>` with the actual filename of the model inside `/path/to/models`.
5488

5589
This approach can be combined with other arguments as needed.
5690

91+
### Run bash shell in the container
92+
93+
To override the entry point and start the container with the bash shell, run the following command:
94+
95+
```sh
96+
docker run -it --rm \
97+
--device=/dev/dri \
98+
--net host \
99+
--entrypoint /bin/bash \
100+
llamacpp-intel-sycl:latest
101+
102+
```
103+
- `--net host` Instead of creating a own different network in the container, it is going to use host network.
104+
- To override the default server entry point and start the container with a Bash shell instead of the default command, we can use `--entrypoint /bin/bash`.
105+
106+
---
107+
57108
## Notes
109+
58110
- Make sure your host system has an Intel GPU and the necessary drivers installed.
59111
- For more information about supported models, server options, and how to call inference endpoints, see the [llama-cpp-python OpenAI Server documentation](https://llama-cpp-python.readthedocs.io/en/latest/server/).
112+
- If you're behind some proxies, please update the `config.json` file with the correct proxy settings before running the container.
113+
```sh
114+
## ~/.docker/config.json
115+
{
116+
"proxies":{
117+
"default": {
118+
"httpProxy": <your-proxy-details>,
119+
"httpsProxy": <your-proxy-details>,
120+
"noProxy": <your-proxy-details>
121+
}
122+
}
123+
}
124+
```
125+
126+
---
127+
128+
## License:
129+
130+
View the LICENSE file for the repository [here](./LICENSE).

0 commit comments

Comments
 (0)