Skip to content

Commit e089b84

Browse files
committed
vllm 0.13.0
1 parent 0be0579 commit e089b84

5 files changed

Lines changed: 13 additions & 17 deletions

File tree

README.md

Lines changed: 8 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# Fork of KServe for huggingfaceserver CVE fixes
22

3-
This is a fork of kserve that serves to document how we built the image:
3+
This is a fork of kserve that serves to document how we built the images:
44

55
```
6-
*******782.dkr.ecr.us-east-1.amazonaws.com/library/kserve-huggingfaceserver:v0.16.0
6+
*******782.dkr.ecr.us-east-1.amazonaws.com/library/kserve-huggingfaceserver:v0.16.0*
77
```
88

99
The official image released by kserve had several high and critical CVEs. To build our version, use the `python/huggingface_server.Dockerfile` dockerfile.
@@ -37,22 +37,18 @@ curl -v http://0.0.0.0:8080/openai/v1/chat/completions -H "Content-Type: applica
3737

3838
The `reasoning_effort` is not available for all models.
3939

40-
## SHA256 fix
40+
# Updating vLLM version
4141

42-
The image:
42+
To update the vLLM version, edit the following files:
4343

4444
```
45-
**********782.dkr.ecr.us-east-1.amazonaws.com/library/kserve-huggingfaceserver:v0.16.0.sha256.1
45+
python/huggingface_server.Dockerfile # (VLLM_VERSION arg)
46+
python/huggingfaceserver/pyproject.toml
47+
python/kserve/pyproject.toml
4648
```
4749

48-
is a temporary workaround to allow vLLM to work in FIPS constrained environments, where `hashlib.md5` is disabled. This image was made by first building the one above, and then exec-ing into it and running the following commands:
50+
Make sure you test your builds before deploying them after updating vLLM's version. The vLLM project is known to sometimes shuffle stuff internally and that can break kserve's vllm usage patterns.
4951

50-
```bash
51-
$ cd /kserve-workspace/prod_venv/lib64/python3.12/site-packages/vllm/
52-
$ find . -type f -exec sed -i 's/hashlib\.md5/hashlib.sha256/g' {} +
53-
```
54-
55-
This replaces all `hashlib.md5` calls with `hashlib.sha256`. Once that change is made inside the container, that running image is committed so the changes persist.
5652

5753
# KServe
5854
[![go.dev reference](https://img.shields.io/badge/go.dev-reference-007d9c?logo=go&logoColor=white)](https://pkg.go.dev/github.com/kserve/kserve)

python/huggingface_server.Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ WORKDIR ${WORKSPACE_DIR}
5252
FROM base AS build
5353

5454
ARG WORKSPACE_DIR
55-
ARG VLLM_VERSION=0.12.0
55+
ARG VLLM_VERSION=0.13.0
5656
ARG LMCACHE_VERSION=0.3.0
5757
ARG BITSANDBYTES_VERSION=0.46.1
5858
ARG FLASHINFER_VERSION=0.2.6.post1

python/huggingfaceserver/huggingfaceserver/vllm/vllm_model.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ async def start_engine(self):
174174
chat_template=resolved_chat_template,
175175
chat_template_content_format=self.args.chat_template_content_format,
176176
)
177-
if self.model_config.task == "embed"
177+
if self.model_config.runner_type == "embed"
178178
else None
179179
)
180180

@@ -184,7 +184,7 @@ async def start_engine(self):
184184
self.openai_serving_models,
185185
request_logger=self.request_logger,
186186
)
187-
if self.model_config.task == "classify"
187+
if self.model_config.runner_type == "classify"
188188
else None
189189
)
190190

python/huggingfaceserver/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ dependencies = [
1111
"accelerate<2.0.0,>=1.6.0",
1212
"torch>=2.7.0",
1313
"triton>=3.2.0",
14-
"vllm==0.12.0",
14+
"vllm==0.13.0",
1515
"bitsandbytes>=0.45.3",
1616
"modelscope<2.0.0,>=1.16.0",
1717
"setuptools>=70.0.0",

python/kserve/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ ray = [
6464
"ray[serve]>=2.43.0",
6565
]
6666
llm = [
67-
"vllm==0.12.0",
67+
"vllm==0.13.0",
6868
]
6969

7070
[dependency-groups]

0 commit comments

Comments
 (0)