Skip to content

[deps][LLM] Upgrade vLLM to 0.15.0#60253

Closed
nrghosh wants to merge 20 commits intoray-project:masterfrom
nrghosh:nrghosh/vllm-0.14.0-rc
Closed

[deps][LLM] Upgrade vLLM to 0.15.0#60253
nrghosh wants to merge 20 commits intoray-project:masterfrom
nrghosh:nrghosh/vllm-0.14.0-rc

Conversation

@nrghosh
Copy link
Contributor

@nrghosh nrghosh commented Jan 17, 2026

Summary

Upgrade vLLM dependency from 0.13.0 to 0.15.0.

Code fixes due to vLLM breaking changes

  • PoolingParams.normalize → use_activation

    • Relevant files: python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py)
    • Relevant vLLM PR: vllm#32243
  • Multi-GPU DP tests switched to MoE models

    • Relevant files: doc/source/llm/doc_code/serve/multi_gpu/dp_basic_example.py, dp_pd_example.py
    • Relevant vLLM PR: vllm#30739 vLLM now makes DP ranks independent for dense models
  • parse_chat_messages_futuresparse_chat_messages_async

    • Relevant files: python/ray/llm/_internal/batch/stages/prepare_multimodal_stage.py, release/llm_tests/batch/test_batch_vllm.py
    • Relevant vLLM PR: vllm#30200
  • OpenAI protocol adjustment

    • Relevant files: python/ray/llm/_internal/serve/core/configs/openai_api_models.py, python/ray/llm/_internal/serve/engines/vllm/vllm_engine.py, python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py)
    • Relevant vLLM PR: vllm#32240
    • Example:
# With vLLM 0.13.0
from vllm.entrypoints.openai.protocol import (
    DetokenizeRequest as vLLMDetokenizeRequest,
)

# Now, with vLLM 0.15.0
from vllm.entrypoints.serve.tokenize.protocol import (
    DetokenizeRequest as vLLMDetokenizeRequest,
)

Dependency changes

  1. PyTorch 2.9.1 now required (default wheel compiled against CUDA 12.9)
  2. numpy > 2.0

Testing

  • LLM CPU tests
  • LLM multi-GPU tests
  • LLM GPU tests
  • LLM Batch Release tests (run locally)
  • LLM Serve Release tests (run locally)
  • Verify no breaking API changes
image

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades the vLLM dependency to version 0.14.0rc1. The changes include updating the version in requirements.txt, setup.py, and the Dockerfile. A detailed analysis document is also added, which is a great addition. My review focuses on ensuring the accuracy of this analysis document. I've found a couple of inconsistencies in the analysis document that should be addressed for clarity and correctness. Otherwise, the changes look good.

@nrghosh nrghosh force-pushed the nrghosh/vllm-0.14.0-rc branch from e3d235b to 01d9154 Compare January 17, 2026 00:35
@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Jan 17, 2026
@nrghosh nrghosh force-pushed the nrghosh/vllm-0.14.0-rc branch from 261437a to 8cc3ce8 Compare January 21, 2026 19:53
@nrghosh nrghosh changed the title [LLM] Upgrade vLLM to 0.14.0 [deps][LLM] Upgrade vLLM to 0.14.0 Jan 21, 2026
@nrghosh nrghosh force-pushed the nrghosh/vllm-0.14.0-rc branch from cf7f2be to b766902 Compare January 22, 2026 00:11
Copy link
Contributor Author

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Running llm release tests - cpu/gpu llm tests unblocked
  • main blocker it seems is the protobuf upgrade conflict + vllm 0.14.0 requiring a torch upgrade to torch==2.9.1+cpu

cc @aslonnie @elliot-barn

Copy link
Contributor Author

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-gpu test regression is fixed (running locally with vllm0.14.0) but is now OOMing on CI https://buildkite.com/ray-project/premerge/builds/58312/steps/table?sid=019be30d-ed6f-4ed6-94c7-6d9c87068347

cc @eicherseiji if we want to request them to be bumped from T4 -> L4 iirc or fix it on the config side

nrghosh and others added 8 commits January 26, 2026 15:58
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
…ctivation

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
- Use a MoE model (Deepseek-V2-Lite) because
vllm-project/vllm#30739 changes how vLLM handles
DP ranks - overrides dp_size=1 and dp_rank=0 if non-MoE model

- Fixes doc/source/llm/doc_code/serve/multi_gpu/dp_basic_example.py and
 doc/source/llm/doc_code/serve/multi_gpu/dp_pd_example.py

- vLLM 0.14.0 commit bd877162e optimizes DP for dense models by making each rank independent and only preserving DP coordination for MoE models where it's needed for expert

- Impact: Ray's DPServer DP coordination (rank assignment, stats addresses) was ignored for dense models like Qwen2.5-0.5B-Instruct, causing cascading assertion failures

- Fix: The tests now use an MoE model where vLLM's DP coordination is preserved. Outside of this test, dense model deployments should use Ray Serve replicas (num_replicas) instead of vLLM's data_parallel_size.

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
@duyleekun
Copy link

https://github.com/vllm-project/vllm/releases/tag/v0.15.0 released, just saying :)

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
# Remove the GPU constraints, numpy pin, and scipy pin (LLM requires numpy>=2 and compatible scipy)
cp "python/${FILENAME}" "/tmp/ray-deps/${FILENAME}"
sed -e '/^--extra-index-url /d' -e '/^--find-links /d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp"
sed -e '/^--extra-index-url /d' -e '/^--find-links /d' -e '/^numpy==/d' -e '/^scipy==/d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is modified by Claude. We'll see if we need this.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang-anyscale
Copy link
Contributor

Ran the following locally and everything succeeded. Trying to wrap my head around why premerge fails.

bash ci/ci.sh compile_pip_dependencies
bash ci/compile_llm_requirements.sh
bazel run //ci/raydepsets:raydepsets -- build --all-configs

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
…ck locally

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
- --python-version=3.11
- --unsafe-package ray
- --python-platform=linux
# Use manylinux_2_31 for vllm 0.15.0 wheel compatibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hint: Wheels are available for `vllm` (v0.15.0) on the following platforms: `manylinux_2_31_aarch64`, `manylinux_2_31_x86_64`

Copy link
Contributor

@jeffreywang-anyscale jeffreywang-anyscale Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linux defaults to manylinux_2_28_x86_64 which vllm 0.15.0 does not support

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@duyleekun
Copy link

What's current ray policy on vllm version support? Since 0.15 produces a lot of breaking changes and some might want to mix vllm versions between ray apps.

@jeffreywang-anyscale jeffreywang-anyscale changed the title [deps][LLM] Upgrade vLLM to 0.14.0 [deps][LLM] Upgrade vLLM to 0.15.0 Jan 31, 2026
@jeffreywang-anyscale jeffreywang-anyscale marked this pull request as ready for review January 31, 2026 21:23
@@ -0,0 +1,9 @@
# Override vLLM's torch==2.9.1+cpu requirement to allow CUDA variants
torch>=2.9.0
# Override numpy constraint (vLLM requires opencv-python-headless>=4.13.0 which requires numpy>=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vLLM requires opencv-python-headless>=4.13.0, introduced by vllm-project/vllm#32668.

>>> from importlib.metadata import requires
>>> requires('vllm')
[..., 'opencv-python-headless>=4.13.0', ...]

opencv-python-headless==4.13.0 requires numpy>=2

>>> from importlib.metadata import requires
>>> requires('opencv-python-headless')
['numpy<2.0; python_version < "3.9"', 'numpy>=2; python_version >= "3.9"']

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary to avoid this error

  ╰─▶ Because opencv-python-headless==4.13.0.90 depends on numpy>=2 and numpy==1.26.4, we can conclude that
      opencv-python-headless==4.13.0.90 cannot be used.
      And because only the following versions of opencv-python-headless are available:
          opencv-python-headless<4.13.0
          opencv-python-headless==4.13.0.90
      and vllm==0.15.0 depends on opencv-python-headless>=4.13.0, we can conclude that vllm==0.15.0 cannot be used.
      And because only vllm[audio]<=0.15.0 is available and you require vllm[audio]>=0.15.0, we can conclude that your requirements
      are unsatisfiable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are overriding with numpy>=2 too right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, overriding with numpy>=2 too as numpy>=2 is required for opencv-python-headless>=4.13.0.

# Override numpy constraint (vLLM requires opencv-python-headless>=4.13.0 which requires numpy>=2)
# Upper bound <2.3 due to cupy-cuda12x==13.4.0 compatibility
numpy>=2.0.0,<2.3
# Override scipy to allow version compatible with numpy 2.x (scipy>=1.14 supports numpy 2.x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scipy used to be 1.11.4 which does not support numpy 2.x:

>>> from importlib.metadata import requires
>>> requires('scipy')
['numpy<1.28.0,>=1.21.6', ...]

# Override vLLM's torch==2.9.1+cpu requirement to allow CUDA variants
torch>=2.9.0
# Override numpy constraint (vLLM requires opencv-python-headless>=4.13.0 which requires numpy>=2)
# Upper bound <2.3 due to cupy-cuda12x==13.4.0 compatibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> requires('cupy-cuda12x')
['numpy<2.3,>=1.22', ...]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the upper bound constraint.

@ray-gardener ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation llm labels Feb 1, 2026
@jeffreywang-anyscale
Copy link
Contributor

jeffreywang-anyscale commented Feb 1, 2026

Without pandas and scipy override, llm tests run into this runtime failure:


[2026-02-01T00:39:15Z] Traceback (most recent call last):
--
[2026-02-01T00:39:15Z]   File "/root/.cache/bazel/_bazel_root/1df605deb6d24fc8068f6e25793ec703/execroot/io_ray/bazel-out/k8-opt/bin/python/ray/llm/tests/batch/cpu/processor/test_backward_compat.runfiles/io_ray/python/ray/llm/tests/batch/cpu/processor/test_backward_compat.py", line 7, in <module>
[2026-02-01T00:39:15Z]     from ray.llm._internal.batch.processor.vllm_engine_proc import vLLMEngineProcessorConfig
[2026-02-01T00:39:15Z]   File "/rayci/python/ray/llm/_internal/batch/__init__.py", line 1, in <module>
[2026-02-01T00:39:15Z]     from ray.llm._internal.batch.processor import (
[2026-02-01T00:39:15Z]   File "/rayci/python/ray/llm/_internal/batch/processor/__init__.py", line 1, in <module>
[2026-02-01T00:39:15Z]     from .base import Processor, ProcessorBuilder, ProcessorConfig
[2026-02-01T00:39:15Z]   File "/rayci/python/ray/llm/_internal/batch/processor/base.py", line 7, in <module>
[2026-02-01T00:39:15Z]     from ray.data import Dataset
[2026-02-01T00:39:15Z]   File "/rayci/python/ray/data/__init__.py", line 3, in <module>
[2026-02-01T00:39:15Z]     import pandas  # noqa
[2026-02-01T00:39:15Z]     ^^^^^^^^^^^^^
[2026-02-01T00:39:15Z]   File "/opt/miniforge/lib/python3.11/site-packages/pandas/__init__.py", line 22, in <module>
[2026-02-01T00:39:15Z]     from pandas.compat import is_numpy_dev as _is_numpy_dev  # pyright: ignore # noqa:F401
[2026-02-01T00:39:15Z]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-02-01T00:39:15Z]   File "/opt/miniforge/lib/python3.11/site-packages/pandas/compat/__init__.py", line 18, in <module>
[2026-02-01T00:39:15Z]     from pandas.compat.numpy import (
[2026-02-01T00:39:15Z]   File "/opt/miniforge/lib/python3.11/site-packages/pandas/compat/numpy/__init__.py", line 4, in <module>
[2026-02-01T00:39:15Z]     from pandas.util.version import Version
[2026-02-01T00:39:15Z]   File "/opt/miniforge/lib/python3.11/site-packages/pandas/util/__init__.py", line 2, in <module>
[2026-02-01T00:39:15Z]     from pandas.util._decorators import (  # noqa:F401
[2026-02-01T00:39:15Z]   File "/opt/miniforge/lib/python3.11/site-packages/pandas/util/_decorators.py", line 14, in <module>
[2026-02-01T00:39:15Z]     from pandas._libs.properties import cache_readonly
[2026-02-01T00:39:15Z]   File "/opt/miniforge/lib/python3.11/site-packages/pandas/_libs/__init__.py", line 13, in <module>
[2026-02-01T00:39:15Z]     from pandas._libs.interval import Interval
[2026-02-01T00:39:15Z]   File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
[2026-02-01T00:39:15Z] ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject


Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

# Remove the GPU constraints, numpy, scipy, and pandas pin (vLLM 0.15.0+ requires numpy>=2, compatible scipy, and pandas>=2.0)
cp "python/${FILENAME}" "/tmp/ray-deps/${FILENAME}"
sed -e '/^--extra-index-url /d' -e '/^--find-links /d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp"
sed -e '/^--extra-index-url /d' -e '/^--find-links /d' -e '/^numpy==/d' -e '/^scipy==/d' -e '/^pandas==/d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be here..

first, this is a generic file used not only for llm images, it is used for all images. this PR should not change how other images are built.

second, the name of this file says "remove-compiled-headers". this is doing more than that.

third, why not just upgrade? why do relaxing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid from modifying this, but it seems like I'm actually touching something with broader impacts.

Do you think it's fine to upgrade this ^?

@@ -0,0 +1,9 @@
# Override vLLM's torch==2.9.1+cpu requirement to allow CUDA variants
torch>=2.9.0
# Override numpy constraint (vLLM requires opencv-python-headless>=4.13.0 which requires numpy>=2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are overriding with numpy>=2 too right?

@@ -0,0 +1,9 @@
# Override vLLM's torch==2.9.1+cpu requirement to allow CUDA variants
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will make it hard for user to extend the image's python dependency on the image (e.g. for building examples).. you would have to publish this file in some way, and also teach users how to use it..

I think what you are looking for is:

  • either, detach somehow from the ray image version constraints, so that you run with your own versions.
  • or, fork vllm, get a ray-vllm package, with different requirement constraints. if this kind of relaxing actually works, it means that vllm is declaring the requirement constraints incorrectly / overly restrictive.

Copy link
Contributor

@jeffreywang-anyscale jeffreywang-anyscale Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the explanation! @elliot-barn do have any context why llm-override.txt was introduced in the first place via 758b2c9#diff-692ee14854624e06a174ba5cf8fa09b39acd189e0a063a854968e0b6bf332d4c?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this as a workaround for a few conflicts I encountered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs An issue or change related to documentation go add ONLY when ready to merge, run all tests llm serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants