After one of Intel's many changes to their software stack, this approach became obsolete. Head over to their pages to see what they've abandoned this week.
Docker image providing fastchat (webui and api) for Intel Arc GPUs with IPEX-LLM(https://github.com/intel-analytics/ipex-llm).
- To start, you absolutely must install the latest drivers for your GPU, even if you think you've already got them in your kernel:
sudo apt-get install -y gpg-agent wget
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
sudo gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg
echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy client" | \
sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list
sudo apt-get update
sudo apt-get -y install \
gawk \
dkms \
linux-headers-$(uname -r) \
libc6-dev
sudo apt install intel-i915-dkms intel-fw-gpu
sudo apt-get install -y gawk libc6-dev udev\
intel-opencl-icd intel-level-zero-gpu level-zero \
intel-media-va-driver-non-free libmfx1 libmfxgen1 libvpl2 \
libegl-mesa0 libegl1-mesa libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \
mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo
sudo reboot
- Set up permissions:
sudo gpasswd -a ${USER} render
newgrp render
# Verify the device is working with i915 driver
sudo apt-get install -y hwinfo
hwinfo --display
- Install Docker (if you don't already have it)
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
- Clone this repo.
git clone https://github.com/digitalscream/ipex-llm-fastchat-docker.git
cd ipex-llm-fastchat-docker
- Make some directories for storing the models, and for the logs (make sure you do this on a disk with plenty of space...you're going to be downloading a lot of models when you start playing with it...):
mkdir ~/fastchat
mkdir ~/fastchat/logs
- Build (don't forget the period on the end):
docker build --tag 'ipex-llm-fastchat-docker' .
- Run!
docker run --device /dev/dri -v ~/fastchat:/root/.cache/huggingface -v ~/fastchat/logs:/logs \
-p 7860:7860 -p 8000:8000 ipex-llm-fastchat-docker:latest \
--model-path mistralai/Mistral-7B-Instruct-v0.2
NOTE: if you want to use an AWQ-quantised model, you'll need --load-in-low-bit asym_int4 on the end:
docker run --device /dev/dri -v ~/fastchat:/root/.cache/huggingface -v ~/fastchat/logs:/logs \
-p 7860:7860 -p 8000:8000 ipex-llm-fastchat-docker:latest \
--model-path TheBloke/laser-dolphin-mixtral-2x7b-dpo-AWQ --load-in-low-bit asym_int4
- Play with it. Visit
http://localhost:7860in your browser and off you go. For extra points, if you're a VS Code user, you can install the Continue extension. The base URL ishttp://localhost:8000/v1, and just set the API key toEMPTY.
My system is a Ryzen 3600 with 96GB RAM and an Arc A770 16GB. With Mistral 7b, I see around 60 tokens/s. With Mixtral 2x7b AWQ, that drops to 30-40 tokens/s (understandably). If you're seeing ~8-10 tokens/s, then I can almost guarantee that you haven't installed the latest GPU drivers.
This is pretty much temporary - there's currently a bug in the IPEX vLLM worker, which should be fixed when they release 2.5.0. They put a hack in place to get around the fact that Mistral models weren't properly supported by transformers at the time, but the hack wasn't completely removed. This version of config.py fixes that.
At some point, I'll get around to putting together a decent docker-compose.yml to package the whole lot together.
Thanks to @itlackey for the startup.sh script, and the Intel devs for the example Dockerfiles needed to get this all up and running.