Add sleep/wake support for diffusion engine by MikukuOvO · Pull Request #22659 · sgl-project/sglang

MikukuOvO · 2026-04-13T04:28:52Z

Motivation

This PR is inherited from and rebased on top of #19152

It adds coarse-grained sleep/wake support for the multimodal diffusion engine so the server can temporarily release GPU memory without restarting the process.

Modifications

Add /release_memory_occupation and /resume_memory_occupation for diffusion.
Add scheduler/worker support to move active pipeline modules between GPU and CPU.
Reject generation requests while the server is sleeping.
Add tests and documentation for the sleep/wake flow.

Accuracy Tests

This PR does not change model math or kernel behavior.

Functional checks were run for:

Qwen/Qwen-Image on 1 GPU
Qwen/Qwen-Image with TP=2
Tongyi-MAI/Z-Image
Tongyi-MAI/Z-Image-Turbo

Verified behavior:

sleep releases GPU memory
generation while sleeping returns 400
wake restores service and generation works again

Speed Tests and Profiling

This PR is not intended as an inference speed optimization.

Checklist

Format your code according to the [Format code with pre-commit](https://docs.sglang.io/developer_guide/
contribution_guide.html#format-code-with-pre-commit).
Add unit tests according to the [Run and add unit tests](https://docs.sglang.io/developer_guide/contribution_guide.html#run-
and-add-unit-tests).
Update documentation according to [Write documentations](https://docs.sglang.io/developer_guide/contribution_guide.html#write-
documentations).
Provide accuracy and speed benchmark results according to [Test the accuracy](https://docs.sglang.io/developer_guide/
contribution_guide.html#test-the-accuracy) and [Benchmark the speed](https://docs.sglang.io/developer_guide/
contribution_guide.html#benchmark-the-speed).
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the [PR Merge Process](https://github.com/sgl-project/sglang/blob/main/.github/
MAINTAINER.md#pull-request-merge-process).
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or
contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

…line

Co-Authored-By: shuwenn <47200617+alphabetc1@users.noreply.github.com>

…ct.py Co-Authored-By: zhaochenyang20 <zhaochen20@outlook.com> Co-Authored-By: shuwenn <47200617+alphabetc1@users.noreply.github.com>

gemini-code-assist · 2026-04-13T04:28:57Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…wake-rebased # Conflicts: # python/sglang/multimodal_gen/runtime/managers/scheduler.py

klhhhhh and others added 30 commits April 12, 2026 01:54

add release and resume handle in scheduler

0ab77ba

implement release and resume memory in gpu worker

5e0d5aa

update release and resume api in http server

8a42a36

pre-commit lint

97145fb

remove tags for diffusion models and update io_struct in post training

db77216

adjust tags in function call

5ce36fd

implement new wake sleep func directly use get updated module in pipe…

d28118f

…line

run lint

abcc12f

Implement new wake and sleep also add sanitize moving part for modules

e78b252

refactor weight api

bc1ed2a

Retrun correct status code call generation when sleeping

db57344

update comment

8d4820a

update lint

4b90d58

refactor all the code

5106056

refactor all the code

ebc544f

refactor gpu_worker and utils in openai entrypoint

94e45bf

fix bugs in utils

7d7f665

fix bugs in utils

12d2190

fix bugs in weight api

e2aae15

fix comment in wake func

55b0789

refactor wake func

a24b08f

add test wake sleep in ci

2c50a65

adds pytest entry

4ef1c1d

fix race condition

704309d

refactor process generation batch

d3480ad

fix bugs:access output using details

a93fe56

change test name

ad5d455

avoid worker exectution failed and keep consistent self._sleeping

a27fe01

refactor gpu worker

d114525

add roll out function

c9819d2

zhaochenyang20 and others added 8 commits April 12, 2026 02:07

add todo for rollback expection

07087b2

self fixing comments

6fcd9c0

refactor: pass the request instance instead of class type

a418d46

Co-Authored-By: shuwenn <47200617+alphabetc1@users.noreply.github.com>

move RL related tests to post-training dir

a9e364d

fix untoched unit test in CI

c5b7e7c

extract _get_module_device into utility helper & add TODO for io_stru…

6fe5ef0

…ct.py Co-Authored-By: zhaochenyang20 <zhaochen20@outlook.com> Co-Authored-By: shuwenn <47200617+alphabetc1@users.noreply.github.com>

remove unit redunct tests

8ccda32

move test

888c15e

MikukuOvO requested review from mickqian, ping1jing2 and yhyang201 as code owners April 13, 2026 04:28

github-actions bot added documentation Improvements or additions to documentation diffusion SGLang Diffusion labels Apr 13, 2026

MikukuOvO added 8 commits April 13, 2026 04:39

refactor: reduce sleep/wake diff noise

cff8755

refactor: simplify sleep/wake error handling

32271e7

refactor: simplify sleep/wake worker control flow

8258d79

refactor: use structured sleeping error type

11cd364

refactor: drop test and utils changes

9498a80

Merge remote-tracking branch 'upstream/main' into dev/pr-19152-sleep-…

84c2fdd

…wake-rebased # Conflicts: # python/sglang/multimodal_gen/runtime/managers/scheduler.py

refactor: decouple timer logging from fastapi

c504629

refactor: simplify generation error mapping

6c611ae

MikukuOvO marked this pull request as draft April 13, 2026 05:07

MikukuOvO added 7 commits April 13, 2026 05:14

refactor: trim defensive sleep/wake checks

9a2298b

refactor: derive sleep state in worker responses

1dbf16c

refactor: extract memory occupation controller

83c5f02

refactor: replace error type with status code

e24b85f

refactor: drop sleep wake docs and todos

408bc0d

refactor: inline memory occupation scheduler handlers

5aff54f

fix: restore gc import in gpu worker

89b26ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sleep/wake support for diffusion engine#22659

Add sleep/wake support for diffusion engine#22659
MikukuOvO wants to merge 66 commits intosgl-project:mainfrom
MikukuOvO:dev/pr-19152-sleep-wake-rebased

MikukuOvO commented Apr 13, 2026

Uh oh!

gemini-code-assist bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

MikukuOvO commented Apr 13, 2026

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants