The main repository for building Pascal-compatible versions of ML applications and libraries.
- vLLM
0.5.5,0.6.0,0.6.1,0.6.1.post1,0.6.1.post,0.6.2,0.6.3,0.6.3.post1,0.6.4,0.6.4.post1,0.6.5,0.6.6,0.6.6.post1,0.7.0,0.7.1,0.7.2,0.7.3,0.8.0,0.8.1,0.8.2,0.8.3,0.8.4,0.8.5,0.9.0,0.9.1,0.9.2,0.10.0andmain(nightly, updates daily) are available in this repository. - Triton
2.2.0,2.3.0,2.3.1,3.0.0,3.1.0,3.2.0,3.3.0,3.3.1,3.4.0are available in this repository.
Important
WARNING: Support for new GPUs has been disabled (v0.7.0+/main)
Due to the increase in vLLM code amount, binary size, and build speed, it is now impractical to build vLLM for all GPU architectures.
To use vLLM on a heterogeneous machine or cluster, use the official version of vLLM for non-Pascal GPUs and this version for Pascal GPUs and use tensor or pipeline parallelism to connect instances.
Note that this change only affects versions above v0.7.0 (including main).
# Pull the vLLM image
docker pull ghcr.io/sasha0552/vllm:v0.10.0 # you can omit the version specifier
# to install nightly version
# You can now follow the official vLLM documentation.
# Replace the official image with this one.Warning
Wheels, as of v0.6.5, is currently in a soft-broken state due to PyTorch. To use them, you need to manually patch PyTorch after installation of vLLM.
Patching PyTorch
Example command assuming you are using a virtual environment located in the current directory
sed -e "s/.major < 7/.major < 6/g" \
-e "s/.major >= 7/.major >= 6/g" \
-i \
venv/lib/python3.12/site-packages/torch/_inductor/scheduler.py \
venv/lib/python3.12/site-packages/torch/utils/_triton.pyI recommend installing transient-package before proceeding. It simplifies the installation of triton.
You can install it globally with pipx:
pipx install transient-packageImportant
If you don't want to install transient-package
If you don't want to install transient-package, you'll need to replace
transient-package install \
--interpreter venv/bin/python \
--source triton \
--target triton-pascalwith
# Remove triton
pip uninstall triton
# Install patched triton
pip install triton-pascalNote that transient-package does more than just pip uninstall triton and pip install triton-pascal.
In particular, it tries to install the correct version of triton, and creates a bogus triton package in case the application checks for the presence of triton.
# Use this repository
export PIP_EXTRA_INDEX_URL="https://sasha0552.github.io/pascal-pkgs-ci/"
# Create virtual environment
python -m venv venv
# Activate virtual environment
source venv/bin/activate
# Install vLLM
pip3 install vllm-pascal==0.10.0 # you can omit the version specifier
# to install nightly version
# Install patched triton
transient-package install \
--interpreter venv/bin/python \
--source triton \
--target triton-pascal
# Launch vLLM
vllm serve --help# Use this repository
export PIP_EXTRA_INDEX_URL="https://sasha0552.github.io/pascal-pkgs-ci/"
# Create virtual environment
python3 -m venv venv
# Activate virtual environment
source venv/bin/activate
# Install aphrodite-engine
pip3 install --extra-index-url https://downloads.pygmalion.chat/whl aphrodite-engine
# Install patched triton
transient-package install \
--interpreter venv/bin/python \
--source triton \
--target triton-pascal
# Launch aphrodite-engine
aphrodite --helptriton (for other applications)
# Use this repository
export PIP_EXTRA_INDEX_URL="https://sasha0552.github.io/pascal-pkgs-ci/"
# Install patched triton
transient-package install \
--interpreter venv/bin/python \
--source triton \
--target triton-pascalInstructions for uploading to PyPI
# Download artifacts
gh run download <run id>
# Install twine
pip3 install twine
# Upload wheels
TWINE_PASSWORD=<pypi token> twine upload */*.whl