Fix: Add enable-sleep-mode flag to enable sleep mode for vllm server#376
Conversation
|
@aavarghese For sleeping mode provided by vLLM engine, it seems that VLLM_SERVER_DEV_MODE=1 environment variable is required. |
|
I see from the test log that vLLM complains |
We have that specified in the ISC env vars today: https://github.com/llm-d-incubation/llm-d-fast-model-actuation/pull/376/changes#diff-732f788854a845a2920edb3005249135a9852282bf15fe838180ac3cb03b0bf0L516 |
Very interesting. So we should not set it for our tests on Kind. But should only have it for our e2e test on Openshift. vLLM in CPU mode log from Kind test fyi: |
Signed-off-by: aavarghese <avarghese@us.ibm.com>
cc15e85 to
0638b99
Compare
| modelServerConfig: | ||
| port: 8005 | ||
| options: "--model TinyLlama/TinyLlama-1.1B-Chat-v1.0" | ||
| options: "--model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --enable-sleep-mode" |
No description provided.