Skip to content

Commit 7bc99f0

Browse files
authored
Updates vLLM CPU image (#220) (#221)
This change switches from the AWS ECR repository to a Quay.io repository for the vLLM CPU image. Removes GPU tolerations to force the Qwen pod to be scheduled on an Intel CPU nodes.
1 parent 6be3da0 commit 7bc99f0

File tree

2 files changed

+1
-5
lines changed

2 files changed

+1
-5
lines changed

bootstrap/ic-shared-llm/base/inference-service-qwen-modelcar.yaml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,3 @@ spec:
3737
memory: 5Gi
3838
runtime: vllm-cpu
3939
storageUri: oci://quay.io/rh-aiservices-bu/qwen2.5-0.5b-quantized.w8a8-modelcar:0.0.1
40-
tolerations:
41-
- effect: NoSchedule
42-
key: nvidia.com/gpu
43-
operator: Exists

bootstrap/ic-shared-llm/base/serving-runtime-vllm-cpu-qwen-modelcar.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ spec:
1919
- python
2020
- '-m'
2121
- vllm.entrypoints.openai.api_server
22-
image: public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.9.1
22+
image: quay.io/rh-aiservices-bu/rhoai-lab-insurance-claim-vllm-cpu:v0.9.1
2323
env:
2424
- name: VLLM_CPU_KVCACHE_SPACE
2525
value: "2"

0 commit comments

Comments
 (0)