Skip to content

Fix: Add enable-sleep-mode flag to enable sleep mode for vllm server#376

Merged
aavarghese merged 1 commit intollm-d-incubation:mainfrom
aavarghese:enablesleepmode
Mar 24, 2026
Merged

Fix: Add enable-sleep-mode flag to enable sleep mode for vllm server#376
aavarghese merged 1 commit intollm-d-incubation:mainfrom
aavarghese:enablesleepmode

Conversation

@aavarghese
Copy link
Copy Markdown
Collaborator

No description provided.

@aavarghese aavarghese requested a review from waltforme March 24, 2026 00:16
@osswangxining
Copy link
Copy Markdown
Member

@aavarghese For sleeping mode provided by vLLM engine, it seems that VLLM_SERVER_DEV_MODE=1 environment variable is required.

@osswangxining osswangxining self-requested a review March 24, 2026 09:35
@waltforme
Copy link
Copy Markdown
Collaborator

I see from the test log that vLLM complains Sleep mode is not supported on current platform if we specify the --enable-sleep-mode flag. More interestingly, looks like for the test-launcher, the is_sleeping API endpoint of vLLM behaves as expected if we don't specify flag nor the equalvilent envar.

@aavarghese
Copy link
Copy Markdown
Collaborator Author

@aavarghese For sleeping mode provided by vLLM engine, it seems that VLLM_SERVER_DEV_MODE=1 environment variable is required.

We have that specified in the ISC env vars today: https://github.com/llm-d-incubation/llm-d-fast-model-actuation/pull/376/changes#diff-732f788854a845a2920edb3005249135a9852282bf15fe838180ac3cb03b0bf0L516

@aavarghese
Copy link
Copy Markdown
Collaborator Author

aavarghese commented Mar 24, 2026

I see from the test log that vLLM complains Sleep mode is not supported on current platform if we specify the --enable-sleep-mode flag. More interestingly, looks like for the test-launcher, the is_sleeping API endpoint of vLLM behaves as expected if we don't specify flag nor the equalvilent envar.

Very interesting. So we should not set it for our tests on Kind. But should only have it for our e2e test on Openshift.

vLLM in CPU mode log from Kind test fyi:

Handling connection for 18001
[2026-03-24 00:22:34] INFO launcher.py:586: VLLM process (PID: 26) started.
INFO 03-24 00:22:34 [utils.py:325] 
INFO 03-24 00:22:34 [utils.py:325]        █     █     █▄   ▄█
INFO 03-24 00:22:34 [utils.py:325]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.15.1
INFO 03-24 00:22:34 [utils.py:325]   █▄█▀ █     █     █     █  model   HuggingFaceTB/SmolLM2-360M-Instruct
INFO 03-24 00:22:34 [utils.py:325]    ▀▀  ��▀▀▀▀ ▀▀▀▀▀ ▀     ▀
INFO 03-24 00:22:34 [utils.py:325] 
INFO 03-24 00:22:34 [utils.py:261] non-default args: {'port': 8005, 'model': 'HuggingFaceTB/SmolLM2-360M-Instruct', 'enable_sleep_mode': True}
(APIServer pid=26) Process Process-1:
(APIServer pid=26) Traceback (most recent call last):
(APIServer pid=26)   File "/opt/uv/python/cpython-3.12.12-linux-aarch64-gnu/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(APIServer pid=26)     self.run()
(APIServer pid=26)   File "/opt/uv/python/cpython-3.12.12-linux-aarch64-gnu/lib/python3.12/multiprocessing/process.py", line 108, in run
(APIServer pid=26)     self._target(*self._args, **self._kwargs)
(APIServer pid=26)   File "/app/launcher.py", line 602, in vllm_kickoff
(APIServer pid=26)     uvloop.run(run_server(args))
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=26)     return __asyncio.run(
(APIServer pid=26)            ^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/uv/python/cpython-3.12.12-linux-aarch64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=26)     return runner.run(main)
(APIServer pid=26)            ^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/uv/python/cpython-3.12.12-linux-aarch64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=26)     return self._loop.run_until_complete(task)
(APIServer pid=26)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=26)     return await main
(APIServer pid=26)            ^^^^^^^^^^
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 919, in run_server
(APIServer pid=26)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 938, in run_server_worker
(APIServer pid=26)     async with build_async_engine_client(
(APIServer pid=26)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/uv/python/cpython-3.12.12-linux-aarch64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=26)     return await anext(self.gen)
(APIServer pid=26)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 147, in build_async_engine_client
(APIServer pid=26)     async with build_async_engine_client_from_engine_args(
(APIServer pid=26)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/uv/python/cpython-3.12.12-linux-aarch64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=26)     return await anext(self.gen)
(APIServer pid=26)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 173, in build_async_engine_client_from_engine_args
(APIServer pid=26)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=26)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1374, in create_engine_config
(APIServer pid=26)     model_config = self.create_model_config()
(APIServer pid=26)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1228, in create_model_config
(APIServer pid=26)     return ModelConfig(
(APIServer pid=26)            ^^^^^^^^^^^^
(APIServer pid=26)   File "/opt/venv/lib/python3.12/site-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=26)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=26) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=26)   Value error, Sleep mode is not supported on current platform. [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]

Signed-off-by: aavarghese <avarghese@us.ibm.com>
modelServerConfig:
port: 8005
options: "--model TinyLlama/TinyLlama-1.1B-Chat-v1.0"
options: "--model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --enable-sleep-mode"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said on Slack, @dumb0002 is working on this PR so he will add all the different modalities in there.

Copy link
Copy Markdown
Collaborator

@waltforme waltforme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense and I think this PR and #332 are complementary, so LGTM.

@aavarghese aavarghese merged commit 29babde into llm-d-incubation:main Mar 24, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants