Bug Description
The /v1/omni/sleep and /v1/omni/wakeup endpoints fail with 'State' object has no attribute 'sleeping_stages' when running in pure diffusion mode (e.g., serving FLUX.2-klein-9B or Z-Image-Turbo).
Root Cause
In omni_init_app_state() (api_server.py), the pure diffusion code path returns early at line ~637 and never reaches line 952 where state.sleeping_stages = set() is initialized. The sleep/wakeup handler at line 2934 then tries to access this uninitialized attribute.
Steps to Reproduce
vllm-omni serve Tongyi-MAI/Z-Image-Turbo --omni --port 8092 --host 0.0.0.0 --enforce-eager
# After server starts:
curl -X POST http://localhost:8092/v1/omni/sleep -H 'Content-Type: application/json' -d '{"level": 1, "stage_ids": [0]}'
Error:
{"error":{"message":"'State' object has no attribute 'sleeping_stages'","type":"InternalServerError","param":null,"code":500}}
Proposed Fix
Initialize state.sleeping_stages = set() in the pure diffusion code path (before the early return), and implement proper sleep/wakeup support for diffusion models (offloading model weights from GPU to CPU).
Environment
- vLLM-Omni 0.20.2
- WSL2 Ubuntu 24.04
- NVIDIA RTX 4090 (24GB)
- Models: FLUX.2-klein-9B, Tongyi-MAI/Z-Image-Turbo
Bug Description
The
/v1/omni/sleepand/v1/omni/wakeupendpoints fail with'State' object has no attribute 'sleeping_stages'when running in pure diffusion mode (e.g., serving FLUX.2-klein-9B or Z-Image-Turbo).Root Cause
In
omni_init_app_state()(api_server.py), the pure diffusion code path returns early at line ~637 and never reaches line 952 wherestate.sleeping_stages = set()is initialized. The sleep/wakeup handler at line 2934 then tries to access this uninitialized attribute.Steps to Reproduce
Error:
{"error":{"message":"'State' object has no attribute 'sleeping_stages'","type":"InternalServerError","param":null,"code":500}}Proposed Fix
Initialize
state.sleeping_stages = set()in the pure diffusion code path (before the early return), and implement proper sleep/wakeup support for diffusion models (offloading model weights from GPU to CPU).Environment