Skip to content

Commit 8ab0461

Browse files
authored
Boost vllm inference performance in Intel Arc B60 (#2324)
Signed-off-by: Yongbozzz <[email protected]>
1 parent e089294 commit 8ab0461

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

EdgeCraftRAG/docker_compose/intel/gpu/arc/compose_vllm_b60.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ services:
157157
DP: ${DP:-1}
158158
entrypoint:
159159
/bin/bash -c "
160-
cd /workspace/vllm/models &&
160+
cd /workspace/vllm/models && source /opt/intel/oneapi/setvars.sh --force &&
161161
VLLM_OFFLOAD_WEIGHTS_BEFORE_QUANT=1 \
162162
TORCH_LLM_ALLREDUCE=1 \
163163
VLLM_USE_V1=1 \
@@ -178,7 +178,6 @@ services:
178178
--max-model-len $${MAX_MODEL_LEN} \
179179
--block-size $${BLOCK_SIZE} \
180180
--quantization $${QUANTIZATION} \
181-
--distributed-executor-backend mp \
182181
-tp=$${TP} \
183182
-dp=$${DP}"
184183
networks:

0 commit comments

Comments
 (0)