Skip to content

digitalscream/ipex-llm-fastchat-docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

THIS REPO IS DEPRECATED

After one of Intel's many changes to their software stack, this approach became obsolete. Head over to their pages to see what they've abandoned this week.

ipex-llm-fastchat-docker

Docker image providing fastchat (webui and api) for Intel Arc GPUs with IPEX-LLM(https://github.com/intel-analytics/ipex-llm).

Installation

  1. To start, you absolutely must install the latest drivers for your GPU, even if you think you've already got them in your kernel:
sudo apt-get install -y gpg-agent wget
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
  sudo gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg

echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy client" | \
  sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list

sudo apt-get update
sudo apt-get -y install \
    gawk \
    dkms \
    linux-headers-$(uname -r) \
    libc6-dev
sudo apt install intel-i915-dkms intel-fw-gpu
sudo apt-get install -y gawk libc6-dev udev\
    intel-opencl-icd intel-level-zero-gpu level-zero \
    intel-media-va-driver-non-free libmfx1 libmfxgen1 libvpl2 \
    libegl-mesa0 libegl1-mesa libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
    libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \
    mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo

sudo reboot
  1. Set up permissions:
sudo gpasswd -a ${USER} render
newgrp render

# Verify the device is working with i915 driver
sudo apt-get install -y hwinfo
hwinfo --display
  1. Install Docker (if you don't already have it)
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
  1. Clone this repo.
git clone https://github.com/digitalscream/ipex-llm-fastchat-docker.git
cd ipex-llm-fastchat-docker
  1. Make some directories for storing the models, and for the logs (make sure you do this on a disk with plenty of space...you're going to be downloading a lot of models when you start playing with it...):
mkdir ~/fastchat
mkdir ~/fastchat/logs

Usage

  1. Build (don't forget the period on the end):
docker build --tag 'ipex-llm-fastchat-docker' .
  1. Run!
docker run --device /dev/dri -v ~/fastchat:/root/.cache/huggingface -v ~/fastchat/logs:/logs \
  -p 7860:7860 -p 8000:8000 ipex-llm-fastchat-docker:latest \
  --model-path mistralai/Mistral-7B-Instruct-v0.2

NOTE: if you want to use an AWQ-quantised model, you'll need --load-in-low-bit asym_int4 on the end:

docker run --device /dev/dri -v ~/fastchat:/root/.cache/huggingface -v ~/fastchat/logs:/logs \
  -p 7860:7860 -p 8000:8000 ipex-llm-fastchat-docker:latest \
  --model-path TheBloke/laser-dolphin-mixtral-2x7b-dpo-AWQ --load-in-low-bit asym_int4
  1. Play with it. Visit http://localhost:7860 in your browser and off you go. For extra points, if you're a VS Code user, you can install the Continue extension. The base URL is http://localhost:8000/v1, and just set the API key to EMPTY.

What's performance like?

My system is a Ryzen 3600 with 96GB RAM and an Arc A770 16GB. With Mistral 7b, I see around 60 tokens/s. With Mixtral 2x7b AWQ, that drops to 30-40 tokens/s (understandably). If you're seeing ~8-10 tokens/s, then I can almost guarantee that you haven't installed the latest GPU drivers.

What's config.py for?

This is pretty much temporary - there's currently a bug in the IPEX vLLM worker, which should be fixed when they release 2.5.0. They put a hack in place to get around the fact that Mistral models weren't properly supported by transformers at the time, but the hack wasn't completely removed. This version of config.py fixes that.

TODO

At some point, I'll get around to putting together a decent docker-compose.yml to package the whole lot together.

Thanks

Thanks to @itlackey for the startup.sh script, and the Intel devs for the example Dockerfiles needed to get this all up and running.

About

Docker image providing fastchat (webui and api) for Intel Arc GPUs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors