[diffusion] fix --warmup silently downgrading server-based warmup to request mode#29514
Conversation
…t mode `sglang serve --warmup` was meant to use server-based warmup (serve injects a default warmup_mode="server"), but because --warmup is "explicit" while the injected warmup_mode is not, _adjust_warmup's `mode_explicit or not legacy_explicit` guard skipped applying the mode and left server_warmup=False (request mode). Combined with --warmup-resolutions (on which request-based warmup bails out), that left NO warmup running at all — the warmup flag effectively disabled warmup. Fix: when legacy --warmup/--server-warmup turned warmup on without an explicit --warmup-mode, honor the resolved warmup_mode (e.g. serve's "server" default) for the server-vs-request split instead of silently staying request-based. `--warmup false` is untouched. Adds unit tests for the legacy-on->server path and the --warmup-resolutions dead-zone. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the server argument resolution logic to correctly handle legacy --warmup and --server-warmup flags when no explicit --warmup-mode is provided. It ensures that the resolved warmup_mode (such as the default 'server' mode) is honored instead of silently downgrading to request-based warmup. Additionally, unit tests have been added to verify this behavior and prevent regressions. There are no review comments, and I have no additional feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
/tag-and-rerun-ci |
Motivation
sglang serve --warmupis meant to use server-based warmup — theserveentrypoint injects a defaultwarmup_mode="server""for production". But it silently downgrades to request-based warmup, and combined with--warmup-resolutionsit disables warmup entirely. The--warmupflag can thus effectively turn warmup off, leaving the first real request to pay the full eager cold-start (observed as ~2× latency in diffusion benchmarking).Root cause
serveinjectswarmup_mode="server"viadefault_args(runtime/entrypoints/cli/serve.py), butdefault_argsare not recorded in_explicit_arg_names. So in_adjust_warmup(runtime/server_args.py), when--warmupis explicit (legacy_explicit=True) but--warmup-modeis not (mode_explicit=False), the guardif mode_explicit or not legacy_explicit:evaluates toFalse— the injectedwarmup_mode="server"is never applied, andserver_warmupfalls back to its dataclass defaultFalse(i.e. request mode).--warmup --warmup-resolutions WxHthen becomes a dead zone:server_warmup=True→ never runs;warmup_resolutions is not None→ never runs.→ no warmup runs at all.
Fix
In
_adjust_warmup, when a legacy--warmup/--server-warmupenabled warmup without an explicit--warmup-mode, honor the resolvedwarmup_mode(e.g.serve's"server"default) for the server-vs-request split, instead of silently staying request-based.--warmup falsekeeps warmup off and is untouched.Test
Adds unit tests in
TestWarmupModeNormalization:--warmupwith mode defaulted toserver→ resolves toserver_warmup=True;--warmup --warmup-resolutionsdead-zone regression → server-based warmup runs.Existing warmup tests (
--warmup false, explicit--warmup-mode, disagg, bareserve) still pass.Verification (H100)
With this fix,
serve --warmupruns server-based warmup, and a diffusion benchmark's measured latencies match the prior client-side-warmup baseline instead of the ~2× cold-start regression seen when warmup silently didn't run — e.g. Z-Image-Turbo 0.74s vs 0.65s baseline (was 13.46s with no warmup), FLUX.1-dev 4.80s vs 4.68s, Ideogram-4 5.26s vs 5.19s.🤖 Generated with Claude Code
CI States
Latest PR Test (Base): ✅ Run #28293800395
Latest PR Test (Extra): ❌ Run #28293800347