update: add env variable for main container(vllm) when DRA is enabled for Intel-xe by zdtsw · Pull Request #189 · llm-d-incubation/llm-d-modelservice

zdtsw · 2026-01-27T14:46:17Z

Description

currently when DRA is enabled it skip adding env VLLM_WORKER_MULTIPROC_METHOD for intel-xe

kalantar · 2026-01-27T18:38:50Z

@yuanwu2017 do you have any comments on the suggestion to add this environment variable as a default? I believe you contributed these originally. I am not familiar with what default env variable make sense.

kalantar · 2026-01-27T18:45:50Z

@zdtsw I don't think this is the root cause of llm-d/llm-d#620. Is modelservice being used? Remove any gpu resource request/limits from the values file and try without them (and set accelerator.type appropriately).

yuanwu2017 · 2026-01-28T05:42:05Z

@zdtsw I don't think this is the root cause of llm-d/llm-d#620. Is modelservice being used? Remove any gpu resource request/limits from the values file and try without them (and set accelerator.type appropriately).

PR518 and PR380 can fix it. Issue620 is caused by the llm-d-modelservice upgrade in llm-d. The type of accelerator changed from "intel" to "intel-i915". The DRA is enabling in llm-d-modelservice, so enabling the dra also can fix the issue620.
@zdtsw did discover a potential problem: using DRA caused the previous default environment variables to stop working, which could prevent vLLM from starting. Therefore, I added these environment variables back in PR380.

@yuanwu2017 do you have any comments on the suggestion to add this environment variable as a default? I believe you contributed these originally. I am not familiar with what default env variable make sense.

I think it is ok for adding a default envs for specific device. But I have not understood how this patch to fix it. If enabling the DRA device in values.yaml, these envs values should not work. @zdtsw @poussa

zdtsw · 2026-01-28T07:38:40Z

@zdtsw I don't think this is the root cause of llm-d/llm-d#620. Is modelservice being used? Remove any gpu resource request/limits from the values file and try without them (and set accelerator.type appropriately).

PR518 and PR380 can fix it. Issue620 is caused by the llm-d-modelservice upgrade in llm-d. The type of accelerator changed from "intel" to "intel-i915". The DRA is enabling in llm-d-modelservice, so enabling the dra also can fix the issue620. @zdtsw did discover a potential problem: using DRA caused the previous default environment variables to stop working, which could prevent vLLM from starting. Therefore, I added these environment variables back in PR380.

@yuanwu2017 do you have any comments on the suggestion to add this environment variable as a default? I believe you contributed these originally. I am not familiar with what default env variable make sense.

I think it is ok for adding a default envs for specific device. But I have not understood how this patch to fix it. If enabling the DRA device in values.yaml, these envs values should not work. @zdtsw @poussa

@kalantar Sorry for bad description in this PR.
Originally, I saw the open llm-d/llm-d#620 and started to look into the code in llm-d and later found this bug had been fixed in llm-d PR518 but the env variable is still missing due to the change a while ago when we split Intel-xe and Intel-i915 instead of old value Intel.
This PR is mainly to catch the env variable which is preferred by vllm for Intel-xe,
Thanks @yuanwu2017 for checking this up.

poussa · 2026-01-28T07:53:43Z

The env object is only for device plugins (accelerator object). In DRA, we don't have the env concept at all. Wait, that is true for the original DRA implementation. Since @kalantar dra refactoring was merged yesterday, I not sure anymore, but I think that is still the case.

Anyway, 1) this PR is against the old DRA implementation and 2) is not correct since it touches the device plugins, not DRA.

zdtsw · 2026-01-28T08:58:17Z

as for the env VLLM_WORKER_MULTIPROC_METHOD if it is needed or not:
https://vllm-dev.slack.com/archives/C07QP347J4D/p1769590188432399

kalantar · 2026-01-28T14:18:51Z

The env object is only for device plugins (accelerator object). In DRA, we don't have the env concept at all. Wait, that is true for the original DRA implementation. Since @kalantar dra refactoring was merged yesterday, I not sure anymore, but I think that is still the case.

Anyway, 1) this PR is against the old DRA implementation and 2) is not correct since it touches the device plugins, not DRA.

It should apply to DRA now too. As long as we are using the same keys.

kalantar · 2026-01-28T14:21:23Z

charts/llm-d-modelservice/values.yaml

+  # @schema
+  # additionalProperties: true
+  # @schema


Why is this annotation needed?

good catch, it is left from my origin change before the dra refactor, let me remove it

kalantar · 2026-01-28T14:22:55Z

@zdtsw please also bump chart version in Chart.yaml and run make pre-commit-run to update the schema and make generate to update all the samples.

zdtsw · 2026-01-28T15:21:35Z

@zdtsw please also bump chart version in Chart.yaml and run make pre-commit-run to update the schema and make generate to update all the samples.

updated

kalantar · 2026-01-28T20:29:13Z

This looks good, let's resolve the conflict and we can merge.

- add VLLM_WORKER_MULTIPROC_METHOD: spawn for Intel-xe - bump version Signed-off-by: Wen Zhou <wenzhou@redhat.com>

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

zdtsw · 2026-01-29T09:36:50Z

This looks good, let's resolve the conflict and we can merge.

thanks, it is rebased.

zdtsw force-pushed the chore_dra_intel branch from af1ab6f to 5b87b65 Compare January 27, 2026 15:55

zdtsw changed the title ~~update: add env variable for main container(vllm) when DRA is enabled for Intel~~ update: add env variable for main container(vllm) when DRA is enabled for Intel-xe Jan 27, 2026

kalantar reviewed Jan 28, 2026

View reviewed changes

kalantar mentioned this pull request Jan 28, 2026

[DRA] Unify Intel resource claim templates #191

Merged

zdtsw force-pushed the chore_dra_intel branch from 5b87b65 to 096a804 Compare January 28, 2026 15:21

zdtsw force-pushed the chore_dra_intel branch from 620fb82 to efa9f28 Compare January 29, 2026 09:32

fix: add intel-xe accelerator environment variable support

ad7c928

- add VLLM_WORKER_MULTIPROC_METHOD: spawn for Intel-xe - bump version Signed-off-by: Wen Zhou <wenzhou@redhat.com>

zdtsw force-pushed the chore_dra_intel branch from efa9f28 to eda8b88 Compare January 29, 2026 09:34

update: fix schema generation

eda8b88

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

kalantar approved these changes Jan 29, 2026

View reviewed changes

kalantar merged commit c97db00 into llm-d-incubation:main Jan 29, 2026
4 checks passed

Conversation

zdtsw commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

kalantar commented Jan 27, 2026

Uh oh!

kalantar commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuanwu2017 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zdtsw commented Jan 28, 2026

Uh oh!

poussa commented Jan 28, 2026

Uh oh!

zdtsw commented Jan 28, 2026

Uh oh!

kalantar commented Jan 28, 2026

Uh oh!

kalantar Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

zdtsw Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

kalantar commented Jan 28, 2026

Uh oh!

zdtsw commented Jan 28, 2026

Uh oh!

kalantar commented Jan 28, 2026

Uh oh!

zdtsw commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zdtsw commented Jan 27, 2026 •

edited

Loading

kalantar commented Jan 27, 2026 •

edited

Loading

yuanwu2017 commented Jan 28, 2026 •

edited

Loading