Skip to content

[diffusion] fix --warmup silently downgrading server-based warmup to request mode#29514

Merged
mickqian merged 3 commits into
mainfrom
mick/fix-warmup-flag-deadzone
Jun 28, 2026
Merged

[diffusion] fix --warmup silently downgrading server-based warmup to request mode#29514
mickqian merged 3 commits into
mainfrom
mick/fix-warmup-flag-deadzone

Conversation

@mickqian

@mickqian mickqian commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

Motivation

sglang serve --warmup is meant to use server-based warmup — the serve entrypoint injects a default warmup_mode="server" "for production". But it silently downgrades to request-based warmup, and combined with --warmup-resolutions it disables warmup entirely. The --warmup flag can thus effectively turn warmup off, leaving the first real request to pay the full eager cold-start (observed as ~2× latency in diffusion benchmarking).

Root cause

serve injects warmup_mode="server" via default_args (runtime/entrypoints/cli/serve.py), but default_args are not recorded in _explicit_arg_names. So in _adjust_warmup (runtime/server_args.py), when --warmup is explicit (legacy_explicit=True) but --warmup-mode is not (mode_explicit=False), the guard if mode_explicit or not legacy_explicit: evaluates to False — the injected warmup_mode="server" is never applied, and server_warmup falls back to its dataclass default False (i.e. request mode).

--warmup --warmup-resolutions WxH then becomes a dead zone:

  • synthetic server warmup requires server_warmup=True → never runs;
  • request-based warmup bails out when warmup_resolutions is not None → never runs.

→ no warmup runs at all.

Fix

In _adjust_warmup, when a legacy --warmup/--server-warmup enabled warmup without an explicit --warmup-mode, honor the resolved warmup_mode (e.g. serve's "server" default) for the server-vs-request split, instead of silently staying request-based. --warmup false keeps warmup off and is untouched.

Test

Adds unit tests in TestWarmupModeNormalization:

  • legacy --warmup with mode defaulted to server → resolves to server_warmup=True;
  • the --warmup --warmup-resolutions dead-zone regression → server-based warmup runs.

Existing warmup tests (--warmup false, explicit --warmup-mode, disagg, bare serve) still pass.

Verification (H100)

With this fix, serve --warmup runs server-based warmup, and a diffusion benchmark's measured latencies match the prior client-side-warmup baseline instead of the ~2× cold-start regression seen when warmup silently didn't run — e.g. Z-Image-Turbo 0.74s vs 0.65s baseline (was 13.46s with no warmup), FLUX.1-dev 4.80s vs 4.68s, Ideogram-4 5.26s vs 5.19s.

🤖 Generated with Claude Code


CI States

Latest PR Test (Base): ✅ Run #28293800395
Latest PR Test (Extra): ❌ Run #28293800347

…t mode

`sglang serve --warmup` was meant to use server-based warmup (serve injects a
default warmup_mode="server"), but because --warmup is "explicit" while the
injected warmup_mode is not, _adjust_warmup's `mode_explicit or not
legacy_explicit` guard skipped applying the mode and left server_warmup=False
(request mode). Combined with --warmup-resolutions (on which request-based
warmup bails out), that left NO warmup running at all — the warmup flag
effectively disabled warmup.

Fix: when legacy --warmup/--server-warmup turned warmup on without an explicit
--warmup-mode, honor the resolved warmup_mode (e.g. serve's "server" default)
for the server-vs-request split instead of silently staying request-based.
`--warmup false` is untouched. Adds unit tests for the legacy-on->server path
and the --warmup-resolutions dead-zone.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the server argument resolution logic to correctly handle legacy --warmup and --server-warmup flags when no explicit --warmup-mode is provided. It ensures that the resolved warmup_mode (such as the default 'server' mode) is honored instead of silently downgrading to request-based warmup. Additionally, unit tests have been added to verify this behavior and prevent regressions. There are no review comments, and I have no additional feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@mickqian

Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@mickqian mickqian merged commit 4a76699 into main Jun 28, 2026
248 of 275 checks passed
@mickqian mickqian deleted the mick/fix-warmup-flag-deadzone branch June 28, 2026 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant