Fix: Add enable-sleep-mode flag to enable sleep mode for vllm server by aavarghese · Pull Request #376 · llm-d-incubation/llm-d-fast-model-actuation

aavarghese · 2026-03-24T00:16:52Z

No description provided.

osswangxining · 2026-03-24T09:35:14Z

@aavarghese For sleeping mode provided by vLLM engine, it seems that VLLM_SERVER_DEV_MODE=1 environment variable is required.

waltforme · 2026-03-24T10:55:36Z

I see from the test log that vLLM complains Sleep mode is not supported on current platform if we specify the --enable-sleep-mode flag. More interestingly, looks like for the test-launcher, the is_sleeping API endpoint of vLLM behaves as expected if we don't specify flag nor the equalvilent envar.

aavarghese · 2026-03-24T13:06:38Z

@aavarghese For sleeping mode provided by vLLM engine, it seems that VLLM_SERVER_DEV_MODE=1 environment variable is required.

We have that specified in the ISC env vars today: https://github.com/llm-d-incubation/llm-d-fast-model-actuation/pull/376/changes#diff-732f788854a845a2920edb3005249135a9852282bf15fe838180ac3cb03b0bf0L516

aavarghese · 2026-03-24T13:10:56Z

I see from the test log that vLLM complains Sleep mode is not supported on current platform if we specify the --enable-sleep-mode flag. More interestingly, looks like for the test-launcher, the is_sleeping API endpoint of vLLM behaves as expected if we don't specify flag nor the equalvilent envar.

Very interesting. So we should not set it for our tests on Kind. But should only have it for our e2e test on Openshift.

vLLM in CPU mode log from Kind test fyi:

Handling connection for 18001
[2026-03-24 00:22:34] INFO launcher.py:586: VLLM process (PID: 26) started.
INFO 03-24 00:22:34 [utils.py:325] 
INFO 03-24 00:22:34 [utils.py:325]        █     █     █▄   ▄█
INFO 03-24 00:22:34 [utils.py:325]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.15.1
INFO 03-24 00:22:34 [utils.py:325]   █▄█▀ █     █     █     █  model   HuggingFaceTB/SmolLM2-360M-Instruct
INFO 03-24 00:22:34 [utils.py:325]    ▀▀  ��▀▀▀▀ ▀▀▀▀▀ ▀     ▀
INFO 03-24 00:22:34 [utils.py:325] 
INFO 03-24 00:22:34 [utils.py:261] non-default args: {'port': 8005, 'model': 'HuggingFaceTB/SmolLM2-360M-Instruct', 'enable_sleep_mode': True}
(APIServer pid=26) Process Process-1:
(APIServer pid=26) Traceback (most recent call last):
(APIServer pid=26)   File "/opt/uv/python/cpython-3.12.12-linux-aarch64-gnu/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(APIServer pid=26)     self.run()
(APIServer pid=26)   File "/opt/uv/python/cpython-3.12.12-linux-aarch64-gnu/lib/python3.12/multiprocessing/process.py", line 108, in run
(APIServer pid=26)     self._target(*self._args, **self._kwargs)
(APIServer pid=26)   File "/app/launcher.py", line 602, in vllm_kickoff
(APIServer pid=26)     uvloop.run(run_server(args))
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=26)     return __asyncio.run(
(APIServer pid=26)            ^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/uv/python/cpython-3.12.12-linux-aarch64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=26)     return runner.run(main)
(APIServer pid=26)            ^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/uv/python/cpython-3.12.12-linux-aarch64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=26)     return self._loop.run_until_complete(task)
(APIServer pid=26)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=26)     return await main
(APIServer pid=26)            ^^^^^^^^^^
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 919, in run_server
(APIServer pid=26)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 938, in run_server_worker
(APIServer pid=26)     async with build_async_engine_client(
(APIServer pid=26)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/uv/python/cpython-3.12.12-linux-aarch64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=26)     return await anext(self.gen)
(APIServer pid=26)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 147, in build_async_engine_client
(APIServer pid=26)     async with build_async_engine_client_from_engine_args(
(APIServer pid=26)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/uv/python/cpython-3.12.12-linux-aarch64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=26)     return await anext(self.gen)
(APIServer pid=26)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 173, in build_async_engine_client_from_engine_args
(APIServer pid=26)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=26)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1374, in create_engine_config
(APIServer pid=26)     model_config = self.create_model_config()
(APIServer pid=26)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1228, in create_model_config
(APIServer pid=26)     return ModelConfig(
(APIServer pid=26)            ^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=26)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=26) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=26)   Value error, Sleep mode is not supported on current platform. [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]

Signed-off-by: aavarghese <avarghese@us.ibm.com>

diegocastanibm · 2026-03-24T13:38:46Z

.github/workflows/ci-e2e-openshift.yaml

            modelServerConfig:
              port: 8005
-              options: "--model TinyLlama/TinyLlama-1.1B-Chat-v1.0"
+              options: "--model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --enable-sleep-mode"


As I said on Slack, @dumb0002 is working on this PR so he will add all the different modalities in there.

waltforme

Make sense and I think this PR and #332 are complementary, so LGTM.

aavarghese requested a review from waltforme March 24, 2026 00:16

osswangxining self-requested a review March 24, 2026 09:35

Fix: Add enable-sleep-mode flag to enable sleep mode for vllm server

0638b99

Signed-off-by: aavarghese <avarghese@us.ibm.com>

aavarghese force-pushed the enablesleepmode branch from cc15e85 to 0638b99 Compare March 24, 2026 13:13

diegocastanibm reviewed Mar 24, 2026

View reviewed changes

waltforme approved these changes Mar 24, 2026

View reviewed changes

aavarghese merged commit 29babde into llm-d-incubation:main Mar 24, 2026
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Add enable-sleep-mode flag to enable sleep mode for vllm server#376

Fix: Add enable-sleep-mode flag to enable sleep mode for vllm server#376
aavarghese merged 1 commit intollm-d-incubation:mainfrom
aavarghese:enablesleepmode

aavarghese commented Mar 24, 2026

Uh oh!

osswangxining commented Mar 24, 2026

Uh oh!

waltforme commented Mar 24, 2026

Uh oh!

aavarghese commented Mar 24, 2026

Uh oh!

aavarghese commented Mar 24, 2026 •

edited

Loading

Uh oh!

diegocastanibm Mar 24, 2026

Uh oh!

waltforme left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

aavarghese commented Mar 24, 2026

Uh oh!

osswangxining commented Mar 24, 2026

Uh oh!

waltforme commented Mar 24, 2026

Uh oh!

aavarghese commented Mar 24, 2026

Uh oh!

aavarghese commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

diegocastanibm Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

waltforme left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aavarghese commented Mar 24, 2026 •

edited

Loading

waltforme left a comment •

edited

Loading