[deps][LLM] Upgrade vLLM to 0.15.0#60253
[deps][LLM] Upgrade vLLM to 0.15.0#60253nrghosh wants to merge 20 commits intoray-project:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request upgrades the vLLM dependency to version 0.14.0rc1. The changes include updating the version in requirements.txt, setup.py, and the Dockerfile. A detailed analysis document is also added, which is a great addition. My review focuses on ensuring the accuracy of this analysis document. I've found a couple of inconsistencies in the analysis document that should be addressed for clarity and correctness. Otherwise, the changes look good.
e3d235b to
01d9154
Compare
261437a to
8cc3ce8
Compare
cf7f2be to
b766902
Compare
nrghosh
left a comment
There was a problem hiding this comment.
multi-gpu test regression is fixed (running locally with vllm0.14.0) but is now OOMing on CI https://buildkite.com/ray-project/premerge/builds/58312/steps/table?sid=019be30d-ed6f-4ed6-94c7-6d9c87068347
cc @eicherseiji if we want to request them to be bumped from T4 -> L4 iirc or fix it on the config side
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
…ctivation Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
- Use a MoE model (Deepseek-V2-Lite) because vllm-project/vllm#30739 changes how vLLM handles DP ranks - overrides dp_size=1 and dp_rank=0 if non-MoE model - Fixes doc/source/llm/doc_code/serve/multi_gpu/dp_basic_example.py and doc/source/llm/doc_code/serve/multi_gpu/dp_pd_example.py - vLLM 0.14.0 commit bd877162e optimizes DP for dense models by making each rank independent and only preserving DP coordination for MoE models where it's needed for expert - Impact: Ray's DPServer DP coordination (rank assignment, stats addresses) was ignored for dense models like Qwen2.5-0.5B-Instruct, causing cascading assertion failures - Fix: The tests now use an MoE model where vLLM's DP coordination is preserved. Outside of this test, dense model deployments should use Ray Serve replicas (num_replicas) instead of vLLM's data_parallel_size. Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
43094cc to
ee57de3
Compare
|
https://github.com/vllm-project/vllm/releases/tag/v0.15.0 released, just saying :) |
ee57de3 to
10801be
Compare
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
10801be to
eca9898
Compare
| # Remove the GPU constraints, numpy pin, and scipy pin (LLM requires numpy>=2 and compatible scipy) | ||
| cp "python/${FILENAME}" "/tmp/ray-deps/${FILENAME}" | ||
| sed -e '/^--extra-index-url /d' -e '/^--find-links /d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp" | ||
| sed -e '/^--extra-index-url /d' -e '/^--find-links /d' -e '/^numpy==/d' -e '/^scipy==/d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp" |
There was a problem hiding this comment.
This is modified by Claude. We'll see if we need this.
|
Ran the following locally and everything succeeded. Trying to wrap my head around why premerge fails. |
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
d706411 to
04eb5d2
Compare
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
…ck locally Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
| - --python-version=3.11 | ||
| - --unsafe-package ray | ||
| - --python-platform=linux | ||
| # Use manylinux_2_31 for vllm 0.15.0 wheel compatibility |
There was a problem hiding this comment.
hint: Wheels are available for `vllm` (v0.15.0) on the following platforms: `manylinux_2_31_aarch64`, `manylinux_2_31_x86_64`
There was a problem hiding this comment.
linux defaults to manylinux_2_28_x86_64 which vllm 0.15.0 does not support
There was a problem hiding this comment.
This is necessary.
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
What's current ray policy on vllm version support? Since 0.15 produces a lot of breaking changes and some might want to mix vllm versions between ray apps. |
| @@ -0,0 +1,9 @@ | |||
| # Override vLLM's torch==2.9.1+cpu requirement to allow CUDA variants | |||
| torch>=2.9.0 | |||
| # Override numpy constraint (vLLM requires opencv-python-headless>=4.13.0 which requires numpy>=2) | |||
There was a problem hiding this comment.
vLLM requires opencv-python-headless>=4.13.0, introduced by vllm-project/vllm#32668.
>>> from importlib.metadata import requires
>>> requires('vllm')
[..., 'opencv-python-headless>=4.13.0', ...]
opencv-python-headless==4.13.0 requires numpy>=2
>>> from importlib.metadata import requires
>>> requires('opencv-python-headless')
['numpy<2.0; python_version < "3.9"', 'numpy>=2; python_version >= "3.9"']
There was a problem hiding this comment.
This is necessary to avoid this error
╰─▶ Because opencv-python-headless==4.13.0.90 depends on numpy>=2 and numpy==1.26.4, we can conclude that
opencv-python-headless==4.13.0.90 cannot be used.
And because only the following versions of opencv-python-headless are available:
opencv-python-headless<4.13.0
opencv-python-headless==4.13.0.90
and vllm==0.15.0 depends on opencv-python-headless>=4.13.0, we can conclude that vllm==0.15.0 cannot be used.
And because only vllm[audio]<=0.15.0 is available and you require vllm[audio]>=0.15.0, we can conclude that your requirements
are unsatisfiable.
There was a problem hiding this comment.
you are overriding with numpy>=2 too right?
There was a problem hiding this comment.
Yes, overriding with numpy>=2 too as numpy>=2 is required for opencv-python-headless>=4.13.0.
| # Override numpy constraint (vLLM requires opencv-python-headless>=4.13.0 which requires numpy>=2) | ||
| # Upper bound <2.3 due to cupy-cuda12x==13.4.0 compatibility | ||
| numpy>=2.0.0,<2.3 | ||
| # Override scipy to allow version compatible with numpy 2.x (scipy>=1.14 supports numpy 2.x) |
There was a problem hiding this comment.
scipy used to be 1.11.4 which does not support numpy 2.x:
>>> from importlib.metadata import requires
>>> requires('scipy')
['numpy<1.28.0,>=1.21.6', ...]
| # Override vLLM's torch==2.9.1+cpu requirement to allow CUDA variants | ||
| torch>=2.9.0 | ||
| # Override numpy constraint (vLLM requires opencv-python-headless>=4.13.0 which requires numpy>=2) | ||
| # Upper bound <2.3 due to cupy-cuda12x==13.4.0 compatibility |
There was a problem hiding this comment.
>>> requires('cupy-cuda12x')
['numpy<2.3,>=1.22', ...]
There was a problem hiding this comment.
Don't need the upper bound constraint.
|
Without pandas and scipy override, llm tests run into this runtime failure: |
92fdeff to
3ae3f3e
Compare
| # Remove the GPU constraints, numpy, scipy, and pandas pin (vLLM 0.15.0+ requires numpy>=2, compatible scipy, and pandas>=2.0) | ||
| cp "python/${FILENAME}" "/tmp/ray-deps/${FILENAME}" | ||
| sed -e '/^--extra-index-url /d' -e '/^--find-links /d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp" | ||
| sed -e '/^--extra-index-url /d' -e '/^--find-links /d' -e '/^numpy==/d' -e '/^scipy==/d' -e '/^pandas==/d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp" |
There was a problem hiding this comment.
this should not be here..
first, this is a generic file used not only for llm images, it is used for all images. this PR should not change how other images are built.
second, the name of this file says "remove-compiled-headers". this is doing more than that.
third, why not just upgrade? why do relaxing?
There was a problem hiding this comment.
I was trying to avoid from modifying this, but it seems like I'm actually touching something with broader impacts.
Do you think it's fine to upgrade this ^?
| @@ -0,0 +1,9 @@ | |||
| # Override vLLM's torch==2.9.1+cpu requirement to allow CUDA variants | |||
| torch>=2.9.0 | |||
| # Override numpy constraint (vLLM requires opencv-python-headless>=4.13.0 which requires numpy>=2) | |||
There was a problem hiding this comment.
you are overriding with numpy>=2 too right?
| @@ -0,0 +1,9 @@ | |||
| # Override vLLM's torch==2.9.1+cpu requirement to allow CUDA variants | |||
There was a problem hiding this comment.
this will make it hard for user to extend the image's python dependency on the image (e.g. for building examples).. you would have to publish this file in some way, and also teach users how to use it..
I think what you are looking for is:
- either, detach somehow from the ray image version constraints, so that you run with your own versions.
- or, fork vllm, get a
ray-vllmpackage, with different requirement constraints. if this kind of relaxing actually works, it means that vllm is declaring the requirement constraints incorrectly / overly restrictive.
There was a problem hiding this comment.
Got it, thanks for the explanation! @elliot-barn do have any context why llm-override.txt was introduced in the first place via 758b2c9#diff-692ee14854624e06a174ba5cf8fa09b39acd189e0a063a854968e0b6bf332d4c?
There was a problem hiding this comment.
I did this as a workaround for a few conflicts I encountered
Summary
Upgrade vLLM dependency from 0.13.0 to 0.15.0.
Code fixes due to vLLM breaking changes
PoolingParams.normalize → use_activation
python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py)Multi-GPU DP tests switched to MoE models
doc/source/llm/doc_code/serve/multi_gpu/dp_basic_example.py,dp_pd_example.pyparse_chat_messages_futures→parse_chat_messages_asyncpython/ray/llm/_internal/batch/stages/prepare_multimodal_stage.py,release/llm_tests/batch/test_batch_vllm.pyOpenAI protocol adjustment
python/ray/llm/_internal/serve/core/configs/openai_api_models.py,python/ray/llm/_internal/serve/engines/vllm/vllm_engine.py,python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py)Dependency changes
Testing