Skip to content

[Bug] Sleep/wakeup endpoints crash for pure diffusion models #3823

@yossiovadia

Description

@yossiovadia

Bug Description

The /v1/omni/sleep and /v1/omni/wakeup endpoints fail with 'State' object has no attribute 'sleeping_stages' when running in pure diffusion mode (e.g., serving FLUX.2-klein-9B or Z-Image-Turbo).

Root Cause

In omni_init_app_state() (api_server.py), the pure diffusion code path returns early at line ~637 and never reaches line 952 where state.sleeping_stages = set() is initialized. The sleep/wakeup handler at line 2934 then tries to access this uninitialized attribute.

Steps to Reproduce

vllm-omni serve Tongyi-MAI/Z-Image-Turbo --omni --port 8092 --host 0.0.0.0 --enforce-eager

# After server starts:
curl -X POST http://localhost:8092/v1/omni/sleep -H 'Content-Type: application/json' -d '{"level": 1, "stage_ids": [0]}'

Error:

{"error":{"message":"'State' object has no attribute 'sleeping_stages'","type":"InternalServerError","param":null,"code":500}}

Proposed Fix

Initialize state.sleeping_stages = set() in the pure diffusion code path (before the early return), and implement proper sleep/wakeup support for diffusion models (offloading model weights from GPU to CPU).

Environment

  • vLLM-Omni 0.20.2
  • WSL2 Ubuntu 24.04
  • NVIDIA RTX 4090 (24GB)
  • Models: FLUX.2-klein-9B, Tongyi-MAI/Z-Image-Turbo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions