pascal-pkgs-ci

The main repository for building Pascal-compatible versions of ML applications and libraries.

vLLM 0.5.5, 0.6.0, 0.6.1, 0.6.1.post1, 0.6.1.post, 0.6.2, 0.6.3, 0.6.3.post1, 0.6.4, 0.6.4.post1, 0.6.5, 0.6.6, 0.6.6.post1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.8.2, 0.8.3, 0.8.4, 0.8.5, 0.9.0, 0.9.1, 0.9.2, 0.10.0 and main (nightly, updates daily) are available in this repository.
Triton 2.2.0, 2.3.0, 2.3.1, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.3.1, 3.4.0 are available in this repository.

Important

WARNING: Support for new GPUs has been disabled (v0.7.0+/main)

Due to the increase in vLLM code amount, binary size, and build speed, it is now impractical to build vLLM for all GPU architectures.
To use vLLM on a heterogeneous machine or cluster, use the official version of vLLM for non-Pascal GPUs and this version for Pascal GPUs and use tensor or pipeline parallelism to connect instances.

Note that this change only affects versions above v0.7.0 (including main).

Installation (docker)

vllm

# Pull the vLLM image
docker pull ghcr.io/sasha0552/vllm:v0.10.0  # you can omit the version specifier
                                            # to install nightly version

# You can now follow the official vLLM documentation.
# Replace the official image with this one.

Installation (manual)

Warning

Wheels, as of v0.6.5, is currently in a soft-broken state due to PyTorch. To use them, you need to manually patch PyTorch after installation of vLLM.

Patching PyTorch

Example command assuming you are using a virtual environment located in the current directory

sed -e "s/.major < 7/.major < 6/g"                                 \
    -e "s/.major >= 7/.major >= 6/g"                               \
    -i                                                             \
    venv/lib/python3.12/site-packages/torch/_inductor/scheduler.py \
    venv/lib/python3.12/site-packages/torch/utils/_triton.py

I recommend installing transient-package before proceeding. It simplifies the installation of triton.

You can install it globally with pipx:

pipx install transient-package

Important

If you don't want to install transient-package

If you don't want to install transient-package, you'll need to replace

transient-package install       \
  --interpreter venv/bin/python \
  --source triton               \
  --target triton-pascal

with

# Remove triton
pip uninstall triton

# Install patched triton
pip install triton-pascal

Note that transient-package does more than just pip uninstall triton and pip install triton-pascal. In particular, it tries to install the correct version of triton, and creates a bogus triton package in case the application checks for the presence of triton.

vllm

# Use this repository
export PIP_EXTRA_INDEX_URL="https://sasha0552.github.io/pascal-pkgs-ci/"

# Create virtual environment
python -m venv venv

# Activate virtual environment
source venv/bin/activate

# Install vLLM
pip3 install vllm-pascal==0.10.0  # you can omit the version specifier
                                  # to install nightly version

# Install patched triton
transient-package install       \
  --interpreter venv/bin/python \
  --source triton               \
  --target triton-pascal

# Launch vLLM
vllm serve --help

aphrodite-engine

# Use this repository
export PIP_EXTRA_INDEX_URL="https://sasha0552.github.io/pascal-pkgs-ci/"

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate

# Install aphrodite-engine
pip3 install --extra-index-url https://downloads.pygmalion.chat/whl aphrodite-engine

# Install patched triton
transient-package install       \
  --interpreter venv/bin/python \
  --source triton               \
  --target triton-pascal

# Launch aphrodite-engine
aphrodite --help

triton (for other applications)

# Use this repository
export PIP_EXTRA_INDEX_URL="https://sasha0552.github.io/pascal-pkgs-ci/"

# Install patched triton
transient-package install       \
  --interpreter venv/bin/python \
  --source triton               \
  --target triton-pascal

Instructions for uploading to PyPI

# Download artifacts
gh run download <run id>

# Install twine
pip3 install twine

# Upload wheels
TWINE_PASSWORD=<pypi token> twine upload */*.whl

Name		Name	Last commit message	Last commit date
Latest commit History 222 Commits
.ci		.ci
.github		.github
patches		patches
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pascal-pkgs-ci

Installation (docker)

vllm

Installation (manual)

vllm

aphrodite-engine

triton (for other applications)

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors 2

Languages

License

sasha0552/pascal-pkgs-ci

Folders and files

Latest commit

History

Repository files navigation

pascal-pkgs-ci

Installation (docker)

vllm

Installation (manual)

vllm

aphrodite-engine

triton (for other applications)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors 2

Languages

Packages