[deps][LLM] Upgrade vLLM to 0.15.0 by nrghosh · Pull Request #60253 · ray-project/ray

nrghosh · 2026-01-17T00:29:19Z

Summary

Upgrade vLLM dependency from 0.13.0 to 0.15.0.

Code fixes due to vLLM breaking changes

PoolingParams.normalize → use_activation
- Relevant files: python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py)
- Relevant vLLM PR: vllm#32243
Multi-GPU DP tests switched to MoE models
- Relevant files: doc/source/llm/doc_code/serve/multi_gpu/dp_basic_example.py, dp_pd_example.py
- Relevant vLLM PR: vllm#30739 vLLM now makes DP ranks independent for dense models
parse_chat_messages_futures → parse_chat_messages_async
- Relevant files: python/ray/llm/_internal/batch/stages/prepare_multimodal_stage.py, release/llm_tests/batch/test_batch_vllm.py
- Relevant vLLM PR: vllm#30200
OpenAI protocol adjustment
- Relevant files: python/ray/llm/_internal/serve/core/configs/openai_api_models.py, python/ray/llm/_internal/serve/engines/vllm/vllm_engine.py, python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py)
- Relevant vLLM PR: vllm#32240
- Example:

# With vLLM 0.13.0
from vllm.entrypoints.openai.protocol import (
    DetokenizeRequest as vLLMDetokenizeRequest,
)

# Now, with vLLM 0.15.0
from vllm.entrypoints.serve.tokenize.protocol import (
    DetokenizeRequest as vLLMDetokenizeRequest,
)

Dependency changes

PyTorch 2.9.1 now required (default wheel compiled against CUDA 12.9)
numpy > 2.0

Testing

gemini-code-assist

Code Review

This pull request upgrades the vLLM dependency to version 0.14.0rc1. The changes include updating the version in requirements.txt, setup.py, and the Dockerfile. A detailed analysis document is also added, which is a great addition. My review focuses on ensuring the accuracy of this analysis document. I've found a couple of inconsistencies in the analysis document that should be addressed for clarity and correctness. Otherwise, the changes look good.

VLLM_0.14.0_UPGRADE_ANALYSIS.md

python/requirements/llm/llm-requirements.txt

nrghosh

Running llm release tests - cpu/gpu llm tests unblocked
main blocker it seems is the protobuf upgrade conflict + vllm 0.14.0 requiring a torch upgrade to torch==2.9.1+cpu

cc @aslonnie @elliot-barn

nrghosh

multi-gpu test regression is fixed (running locally with vllm0.14.0) but is now OOMing on CI https://buildkite.com/ray-project/premerge/builds/58312/steps/table?sid=019be30d-ed6f-4ed6-94c7-6d9c87068347

cc @eicherseiji if we want to request them to be bumped from T4 -> L4 iirc or fix it on the config side

doc/source/llm/doc_code/serve/multi_gpu/dp_pd_example.py

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

…ctivation Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

- Use a MoE model (Deepseek-V2-Lite) because vllm-project/vllm#30739 changes how vLLM handles DP ranks - overrides dp_size=1 and dp_rank=0 if non-MoE model - Fixes doc/source/llm/doc_code/serve/multi_gpu/dp_basic_example.py and doc/source/llm/doc_code/serve/multi_gpu/dp_pd_example.py - vLLM 0.14.0 commit bd877162e optimizes DP for dense models by making each rank independent and only preserving DP coordination for MoE models where it's needed for expert - Impact: Ray's DPServer DP coordination (rank assignment, stats addresses) was ignored for dense models like Qwen2.5-0.5B-Instruct, causing cascading assertion failures - Fix: The tests now use an MoE model where vLLM's DP coordination is preserved. Outside of this test, dense model deployments should use Ray Serve replicas (num_replicas) instead of vLLM's data_parallel_size. Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

duyleekun · 2026-01-29T14:16:22Z

https://github.com/vllm-project/vllm/releases/tag/v0.15.0 released, just saying :)

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale · 2026-01-30T01:26:16Z

ci/raydepsets/pre_hooks/remove-compiled-headers.sh

+# Remove the GPU constraints, numpy pin, and scipy pin (LLM requires numpy>=2 and compatible scipy)
 cp "python/${FILENAME}" "/tmp/ray-deps/${FILENAME}"
-sed -e '/^--extra-index-url /d' -e '/^--find-links /d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp"
+sed -e '/^--extra-index-url /d' -e '/^--find-links /d' -e '/^numpy==/d' -e '/^scipy==/d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp"


This is modified by Claude. We'll see if we need this.

python/requirements/llm/llm-requirements.txt

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale · 2026-01-30T02:59:43Z

Ran the following locally and everything succeeded. Trying to wrap my head around why premerge fails.

bash ci/ci.sh compile_pip_dependencies
bash ci/compile_llm_requirements.sh
bazel run //ci/raydepsets:raydepsets -- build --all-configs

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

…ck locally Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale · 2026-01-31T06:38:15Z

ci/raydepsets/configs/llm_release_tests.depsets.yaml

    - --python-version=3.11
    - --unsafe-package ray
-    - --python-platform=linux
+    # Use manylinux_2_31 for vllm 0.15.0 wheel compatibility


hint: Wheels are available for `vllm` (v0.15.0) on the following platforms: `manylinux_2_31_aarch64`, `manylinux_2_31_x86_64`

linux defaults to manylinux_2_28_x86_64 which vllm 0.15.0 does not support

This is necessary.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

duyleekun · 2026-01-31T10:26:10Z

What's current ray policy on vllm version support? Since 0.15 produces a lot of breaking changes and some might want to mix vllm versions between ray apps.

jeffreywang-anyscale · 2026-01-31T22:30:56Z

python/requirements/llm/llm-override.txt

@@ -0,0 +1,9 @@
+# Override vLLM's torch==2.9.1+cpu requirement to allow CUDA variants
+torch>=2.9.0
+# Override numpy constraint (vLLM requires opencv-python-headless>=4.13.0 which requires numpy>=2)


vLLM requires opencv-python-headless>=4.13.0, introduced by vllm-project/vllm#32668.

>>> from importlib.metadata import requires >>> requires('vllm') [..., 'opencv-python-headless>=4.13.0', ...]

opencv-python-headless==4.13.0 requires numpy>=2

>>> from importlib.metadata import requires >>> requires('opencv-python-headless') ['numpy<2.0; python_version < "3.9"', 'numpy>=2; python_version >= "3.9"']

This is necessary to avoid this error

╰─▶ Because opencv-python-headless==4.13.0.90 depends on numpy>=2 and numpy==1.26.4, we can conclude that opencv-python-headless==4.13.0.90 cannot be used. And because only the following versions of opencv-python-headless are available: opencv-python-headless<4.13.0 opencv-python-headless==4.13.0.90 and vllm==0.15.0 depends on opencv-python-headless>=4.13.0, we can conclude that vllm==0.15.0 cannot be used. And because only vllm[audio]<=0.15.0 is available and you require vllm[audio]>=0.15.0, we can conclude that your requirements are unsatisfiable.

you are overriding with numpy>=2 too right?

Yes, overriding with numpy>=2 too as numpy>=2 is required for opencv-python-headless>=4.13.0.

jeffreywang-anyscale · 2026-01-31T22:34:17Z

python/requirements/llm/llm-override.txt

+# Override numpy constraint (vLLM requires opencv-python-headless>=4.13.0 which requires numpy>=2)
+# Upper bound <2.3 due to cupy-cuda12x==13.4.0 compatibility
+numpy>=2.0.0,<2.3
+# Override scipy to allow version compatible with numpy 2.x (scipy>=1.14 supports numpy 2.x)


scipy used to be 1.11.4 which does not support numpy 2.x:

>>> from importlib.metadata import requires >>> requires('scipy') ['numpy<1.28.0,>=1.21.6', ...]

jeffreywang-anyscale · 2026-01-31T22:34:59Z

python/requirements/llm/llm-override.txt

+# Override vLLM's torch==2.9.1+cpu requirement to allow CUDA variants
+torch>=2.9.0
+# Override numpy constraint (vLLM requires opencv-python-headless>=4.13.0 which requires numpy>=2)
+# Upper bound <2.3 due to cupy-cuda12x==13.4.0 compatibility


>>> requires('cupy-cuda12x') ['numpy<2.3,>=1.22', ...]

Don't need the upper bound constraint.

jeffreywang-anyscale · 2026-02-01T03:53:55Z

Without pandas and scipy override, llm tests run into this runtime failure:


[2026-02-01T00:39:15Z] Traceback (most recent call last):
--
[2026-02-01T00:39:15Z]   File "/root/.cache/bazel/_bazel_root/1df605deb6d24fc8068f6e25793ec703/execroot/io_ray/bazel-out/k8-opt/bin/python/ray/llm/tests/batch/cpu/processor/test_backward_compat.runfiles/io_ray/python/ray/llm/tests/batch/cpu/processor/test_backward_compat.py", line 7, in <module>
[2026-02-01T00:39:15Z]     from ray.llm._internal.batch.processor.vllm_engine_proc import vLLMEngineProcessorConfig
[2026-02-01T00:39:15Z]   File "/rayci/python/ray/llm/_internal/batch/__init__.py", line 1, in <module>
[2026-02-01T00:39:15Z]     from ray.llm._internal.batch.processor import (
[2026-02-01T00:39:15Z]   File "/rayci/python/ray/llm/_internal/batch/processor/__init__.py", line 1, in <module>
[2026-02-01T00:39:15Z]     from .base import Processor, ProcessorBuilder, ProcessorConfig
[2026-02-01T00:39:15Z]   File "/rayci/python/ray/llm/_internal/batch/processor/base.py", line 7, in <module>
[2026-02-01T00:39:15Z]     from ray.data import Dataset
[2026-02-01T00:39:15Z]   File "/rayci/python/ray/data/__init__.py", line 3, in <module>
[2026-02-01T00:39:15Z]     import pandas  # noqa
[2026-02-01T00:39:15Z]     ^^^^^^^^^^^^^
[2026-02-01T00:39:15Z]   File "/opt/miniforge/lib/python3.11/site-packages/pandas/__init__.py", line 22, in <module>
[2026-02-01T00:39:15Z]     from pandas.compat import is_numpy_dev as _is_numpy_dev  # pyright: ignore # noqa:F401
[2026-02-01T00:39:15Z]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-02-01T00:39:15Z]   File "/opt/miniforge/lib/python3.11/site-packages/pandas/compat/__init__.py", line 18, in <module>
[2026-02-01T00:39:15Z]     from pandas.compat.numpy import (
[2026-02-01T00:39:15Z]   File "/opt/miniforge/lib/python3.11/site-packages/pandas/compat/numpy/__init__.py", line 4, in <module>
[2026-02-01T00:39:15Z]     from pandas.util.version import Version
[2026-02-01T00:39:15Z]   File "/opt/miniforge/lib/python3.11/site-packages/pandas/util/__init__.py", line 2, in <module>
[2026-02-01T00:39:15Z]     from pandas.util._decorators import (  # noqa:F401
[2026-02-01T00:39:15Z]   File "/opt/miniforge/lib/python3.11/site-packages/pandas/util/_decorators.py", line 14, in <module>
[2026-02-01T00:39:15Z]     from pandas._libs.properties import cache_readonly
[2026-02-01T00:39:15Z]   File "/opt/miniforge/lib/python3.11/site-packages/pandas/_libs/__init__.py", line 13, in <module>
[2026-02-01T00:39:15Z]     from pandas._libs.interval import Interval
[2026-02-01T00:39:15Z]   File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
[2026-02-01T00:39:15Z] ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

kouroshHakha

LGTM

aslonnie · 2026-02-02T04:41:43Z

ci/raydepsets/pre_hooks/remove-compiled-headers.sh

+# Remove the GPU constraints, numpy, scipy, and pandas pin (vLLM 0.15.0+ requires numpy>=2, compatible scipy, and pandas>=2.0)
 cp "python/${FILENAME}" "/tmp/ray-deps/${FILENAME}"
-sed -e '/^--extra-index-url /d' -e '/^--find-links /d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp"
+sed -e '/^--extra-index-url /d' -e '/^--find-links /d' -e '/^numpy==/d' -e '/^scipy==/d' -e '/^pandas==/d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp"


this should not be here..

first, this is a generic file used not only for llm images, it is used for all images. this PR should not change how other images are built.

second, the name of this file says "remove-compiled-headers". this is doing more than that.

third, why not just upgrade? why do relaxing?

I was trying to avoid from modifying this, but it seems like I'm actually touching something with broader impacts.

ray/python/requirements/test-requirements.txt

Line 73 in d0b1d15

numpy==1.26.4

Do you think it's fine to upgrade this ^?

aslonnie · 2026-02-02T04:49:45Z

python/requirements/llm/llm-override.txt

@@ -0,0 +1,9 @@
+# Override vLLM's torch==2.9.1+cpu requirement to allow CUDA variants
+torch>=2.9.0
+# Override numpy constraint (vLLM requires opencv-python-headless>=4.13.0 which requires numpy>=2)


you are overriding with numpy>=2 too right?

aslonnie · 2026-02-02T04:54:41Z

python/requirements/llm/llm-override.txt

@@ -0,0 +1,9 @@
+# Override vLLM's torch==2.9.1+cpu requirement to allow CUDA variants


this will make it hard for user to extend the image's python dependency on the image (e.g. for building examples).. you would have to publish this file in some way, and also teach users how to use it..

I think what you are looking for is:

either, detach somehow from the ray image version constraints, so that you run with your own versions.

or, fork vllm, get a ray-vllm package, with different requirement constraints. if this kind of relaxing actually works, it means that vllm is declaring the requirement constraints incorrectly / overly restrictive.

Got it, thanks for the explanation! @elliot-barn do have any context why llm-override.txt was introduced in the first place via 758b2c9#diff-692ee14854624e06a174ba5cf8fa09b39acd189e0a063a854968e0b6bf332d4c?

I did this as a workaround for a few conflicts I encountered

gemini-code-assist bot reviewed Jan 17, 2026

View reviewed changes

VLLM_0.14.0_UPGRADE_ANALYSIS.md Outdated Show resolved Hide resolved

VLLM_0.14.0_UPGRADE_ANALYSIS.md Outdated Show resolved Hide resolved

nrghosh force-pushed the nrghosh/vllm-0.14.0-rc branch from e3d235b to 01d9154 Compare January 17, 2026 00:35

eicherseiji added the go add ONLY when ready to merge, run all tests label Jan 17, 2026

aslonnie reviewed Jan 17, 2026

View reviewed changes

python/requirements/llm/llm-requirements.txt Outdated Show resolved Hide resolved

nrghosh force-pushed the nrghosh/vllm-0.14.0-rc branch from 261437a to 8cc3ce8 Compare January 21, 2026 19:53

nrghosh changed the title ~~[LLM] Upgrade vLLM to 0.14.0~~ [deps][LLM] Upgrade vLLM to 0.14.0 Jan 21, 2026

nrghosh force-pushed the nrghosh/vllm-0.14.0-rc branch from cf7f2be to b766902 Compare January 22, 2026 00:11

nrghosh commented Jan 22, 2026

View reviewed changes

eicherseiji reviewed Jan 22, 2026

View reviewed changes

doc/source/llm/doc_code/serve/multi_gpu/dp_pd_example.py Outdated Show resolved Hide resolved

nrghosh and others added 8 commits January 26, 2026 15:58

[LLM] Point vLLM dependency to v0.14.0rc1 for upgrade testing

cd095ee

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

overriding vllm torch requirement

758b2c9

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

including torch and vllm

923dabc

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

[LLM] Update vLLM from rc1 to 0.14.0 release

fc6b087

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

wip - compile deps

c1c11e7

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

wip - fix test - PoolingParams.normalize deprecated in favor of use_a…

d18c71c

…ctivation Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

use smaller MoE model to avoid OOms in CI

3190ad1

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

jeffreywang-anyscale force-pushed the nrghosh/vllm-0.14.0-rc branch from 43094cc to ee57de3 Compare January 26, 2026 23:58

jeffreywang-anyscale force-pushed the nrghosh/vllm-0.14.0-rc branch from ee57de3 to 10801be Compare January 29, 2026 18:18

Upgrade to vLLM 0.15.0 take 2

eca9898

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale force-pushed the nrghosh/vllm-0.14.0-rc branch from 10801be to eca9898 Compare January 30, 2026 01:23

jeffreywang-anyscale reviewed Jan 30, 2026

View reviewed changes

python/requirements/llm/llm-requirements.txt Outdated Show resolved Hide resolved

Take 3

24e1d99

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale added 2 commits January 29, 2026 19:08

Remove extra-index-url

231de88

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Attempt to fix CUDA version incompatibility

04eb5d2

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale force-pushed the nrghosh/vllm-0.14.0-rc branch from d706411 to 04eb5d2 Compare January 30, 2026 18:46

jeffreywang-anyscale added 6 commits January 30, 2026 12:17

OpenAI protocol adjustment

4cd9ecc

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

linter

0d9957c

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Trim

b749a9f

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Run bazel run //ci/raydepsets:raydepsets -- build --all-configs --che…

9f9958c

…ck locally Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Use parse_chat_messages_async

8248ad7

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Clean up

6711055

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale reviewed Jan 31, 2026

View reviewed changes

Recompile locally

cda414a

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale changed the title ~~[deps][LLM] Upgrade vLLM to 0.14.0~~ [deps][LLM] Upgrade vLLM to 0.15.0 Jan 31, 2026

jeffreywang-anyscale marked this pull request as ready for review January 31, 2026 21:23

jeffreywang-anyscale requested review from a team, edoakes and richardliaw as code owners January 31, 2026 21:23

jeffreywang-anyscale reviewed Jan 31, 2026

View reviewed changes

ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation llm labels Feb 1, 2026

Clean up

3ae3f3e

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale force-pushed the nrghosh/vllm-0.14.0-rc branch from 92fdeff to 3ae3f3e Compare February 1, 2026 04:16

kouroshHakha approved these changes Feb 2, 2026

View reviewed changes

aslonnie reviewed Feb 2, 2026

View reviewed changes

jeffreywang-anyscale closed this Feb 3, 2026

		@@ -0,0 +1,9 @@
		# Override vLLM's torch==2.9.1+cpu requirement to allow CUDA variants

Conversation

nrghosh commented Jan 17, 2026 • edited by jeffreywang-anyscale Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Code fixes due to vLLM breaking changes

Dependency changes

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nrghosh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nrghosh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

duyleekun commented Jan 29, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeffreywang-anyscale commented Jan 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeffreywang-anyscale Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

duyleekun commented Jan 31, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeffreywang-anyscale commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeffreywang-anyscale Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

nrghosh commented Jan 17, 2026 •

edited by jeffreywang-anyscale

Loading

nrghosh left a comment •

edited

Loading

jeffreywang-anyscale Jan 31, 2026 •

edited

Loading

jeffreywang-anyscale commented Feb 1, 2026 •

edited

Loading

jeffreywang-anyscale Feb 2, 2026 •

edited

Loading