[Bug] Sleep/wakeup endpoints crash for pure diffusion models

## Bug Description

The `/v1/omni/sleep` and `/v1/omni/wakeup` endpoints fail with `'State' object has no attribute 'sleeping_stages'` when running in pure diffusion mode (e.g., serving FLUX.2-klein-9B or Z-Image-Turbo).

## Root Cause

In `omni_init_app_state()` (`api_server.py`), the pure diffusion code path returns early at line ~637 and never reaches line 952 where `state.sleeping_stages = set()` is initialized. The sleep/wakeup handler at line 2934 then tries to access this uninitialized attribute.

## Steps to Reproduce

```bash
vllm-omni serve Tongyi-MAI/Z-Image-Turbo --omni --port 8092 --host 0.0.0.0 --enforce-eager

# After server starts:
curl -X POST http://localhost:8092/v1/omni/sleep -H 'Content-Type: application/json' -d '{"level": 1, "stage_ids": [0]}'
```

**Error:**
```json
{"error":{"message":"'State' object has no attribute 'sleeping_stages'","type":"InternalServerError","param":null,"code":500}}
```

## Proposed Fix

Initialize `state.sleeping_stages = set()` in the pure diffusion code path (before the early return), and implement proper sleep/wakeup support for diffusion models (offloading model weights from GPU to CPU).

## Environment
- vLLM-Omni 0.20.2
- WSL2 Ubuntu 24.04
- NVIDIA RTX 4090 (24GB)
- Models: FLUX.2-klein-9B, Tongyi-MAI/Z-Image-Turbo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Sleep/wakeup endpoints crash for pure diffusion models #3823

Bug Description

Root Cause

Steps to Reproduce

Proposed Fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] Sleep/wakeup endpoints crash for pure diffusion models #3823

Description

Bug Description

Root Cause

Steps to Reproduce

Proposed Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions