Skip to content

[XPU] Add XPU Dockerfile and related docs#1162

Open
yma11 wants to merge 3 commits intovllm-project:mainfrom
yma11:docker_xpu
Open

[XPU] Add XPU Dockerfile and related docs#1162
yma11 wants to merge 3 commits intovllm-project:mainfrom
yma11:docker_xpu

Conversation

@yma11
Copy link
Contributor

@yma11 yma11 commented Feb 3, 2026

Purpose

This PR adds XPU docker file and updates related docs.
As there is no vllm xpu image published, we build omni image based on vllm docker file directly.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@yma11 yma11 requested a review from hsliuustc0106 as a code owner February 3, 2026 02:24
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e035f4ede

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 26 to 28
FROM final AS vllm-openai
RUN cd ${COMMON_WORKDIR}/vllm-omni
ENTRYPOINT ["vllm", "serve", "--omni"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Clear inherited CMD for vllm-openai image

The vllm-openai stage sets an ENTRYPOINT but does not override the CMD inherited from the final stage (CMD ["/bin/bash"] above). When users run the image without extra args, Docker passes the inherited CMD to the entrypoint, effectively executing vllm serve --omni /bin/bash, which vllm serve will treat as a positional argument and fail. To avoid this, explicitly set CMD [] (or a valid default argument list) in this stage.

Useful? React with 👍 / 👎.

@yma11 yma11 marked this pull request as draft February 4, 2026 02:29
@@ -0,0 +1,28 @@
ARG BASE_IMAGE=gar-registry.caas.intel.com/pytorch/pytorch-ipex-spr:multi-bmg_release_2601_py2.10_ww2605.5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the multi-arc release? or this is temp setting?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we based on an upstreaming vllm xpu? From me, doing this will create technical debt for us, and for debt, we always need pay it with interest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning to building images based on upstream commits. For example, if the coming vLLM-omni release is based on vLLM v0.15.0, I will build the image using file https://github.com/vllm-project/vllm/blob/releases/v0.15.0/docker/Dockerfile.xpu.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. @yao-matrix @xuechendi please take a further look.

yma11 added 2 commits February 6, 2026 05:51
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
@yma11 yma11 marked this pull request as ready for review February 6, 2026 06:13
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 74cc42df39

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

ENTRYPOINT ["vllm", "serve", "--omni"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Source oneAPI env for vllm-openai entrypoint

The image installs oneAPI/oneCCL and only wires its environment into /root/.bashrc (via setvars.sh), but the vllm-openai stage runs vllm serve --omni directly. Non-interactive entrypoints do not source .bashrc, so the oneAPI environment (e.g., library paths for libccl.so) is missing when users run the container with the default entrypoint, which can lead to runtime linker errors or CCL not being found. Consider exporting the needed env vars with ENV or wrapping the entrypoint in bash -lc 'source /root/.bashrc && …' so the runtime environment matches what you prepared.

Useful? React with 👍 / 👎.

Copy link

@xuechendi xuechendi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verified with intel gpu (1500 and B60), both working

@xuechendi
Copy link

@ywang96 , may you help to review and merge.


--8<-- "docs/getting_started/installation/gpu/rocm.inc.md:requirements"

=== "Intel XPU"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is XPU categorized as one type of GPU?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, XPU stands for Intel GPU

@@ -0,0 +1,110 @@
FROM intel/deep-learning-essentials:2025.2.2-0-devel-ubuntu24.04 AS vllm-base
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long does it take to build a image in CI env?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First time will take about 10min due to vllm docker build. We will provide our Intel GPU node to buildkite, so we can ensure vLLM base image cache to accelerate CI docker build time

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the CI test, please check #400 and the follow-up design RFC for detials. We expect all other platform CI e2e time should not exceed the cuda e2e time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the info!
Is it OK to get this PR in firstly, and we will accelerate docker build time in next CI pr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuechendi are there plans to ship XPU docker image on vllm main repo?

ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=bind,source=.git,target=.git \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bind mount is not needed since we are git cloning from vllm, and .git is for vllm-omni.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. removed this line.

Signed-off-by: Yan Ma <yan.ma@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants