Skip to content

[diffusion] warmup: default to model sampling resolution (declare Z-Image default)#29519

Open
mickqian wants to merge 2 commits into
mainfrom
mick/diffusion-warmup-default-resolution
Open

[diffusion] warmup: default to model sampling resolution (declare Z-Image default)#29519
mickqian wants to merge 2 commits into
mainfrom
mick/diffusion-warmup-default-resolution

Conversation

@mickqian

@mickqian mickqian commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

Motivation

Server-based image warmup defaulted to an area-capped "representative" resolution (SERVER_WARMUP_IMAGE_MAX_AREA = 768×768). For larger real requests (e.g. 1024×1024) the first request still paid first-shape kernel autotuning — a ~0.1s residual measured on H100, even though warmup ran.

Fix

_resolve_default_warmup_resolution: for image warmup, prefer the model's sampling_defaults width/height (the most likely real request shape) instead of the area-capped representative, so kernels are specialized for the actual shape. Video keeps the area/frame caps (a full-resolution video warmup is far costlier). Representative selection remains the fallback when a model declares no default width/height.

Z-Image declared no default resolution (it accepts arbitrary /16 resolutions, so width/height were left None), and therefore fell back to the cap. Declare its official default 1024×1024 (supported_resolutions stays None = all allowed, so other resolutions still work without spurious warnings).

Verification (H100)

serve --warmup with no explicit --warmup-resolutions now warms at the model default and matches the client-side-warmup baseline:

case before (area cap) after (model default) baseline
FLUX.1-dev 4.80s (warm @ ≤768²) 4.70s (warm @ 1024) 4.68
Ideogram-4 5.26s 5.21s (warm @ 1024) 5.19
Z-Image-Turbo 512 fallback warms @ 1024 0.65

Relationship

Complements the --warmup dead-zone fix (so server-based warmup actually runs) — that fix is tracked separately. This PR is purely about which resolution the default warmup uses.

🤖 Generated with Claude Code


CI States

Latest PR Test (Base): ⏳ Run #28311301454
Latest PR Test (Extra): ❌ Run #28311301354

…-Image default

Server-based image warmup shrank to an area cap (SERVER_WARMUP_IMAGE_MAX_AREA,
768x768), leaving a ~0.1s first-request cold-start when the real request is
larger (e.g. 1024x1024 paid first-shape kernel autotuning). Default to the
model's sampling_defaults resolution so warmup specializes kernels for the
likely real shape — verified on H100: flux1/ideogram warm at 1024 and match the
client-warmup baseline (4.70s/5.21s vs 4.68s/5.19s) instead of ~0.1s slower.
Video keeps the area/frame caps (a full-resolution video warmup is far costlier).

Z-Image declared no default resolution (it accepts arbitrary /16 resolutions),
so it fell back to the area cap; declare its official default 1024x1024
(supported_resolutions stays None = all allowed) so it benefits too.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the default resolution for Z-Image sampling parameters to 1024x1024 and adjusts the warmup request builder to use the model's default resolution for image generation tasks during server-based warmup, avoiding cold-start autotuning overhead. Feedback suggests evaluating the image generation check lazily to prevent potential attribute errors when server arguments are partially initialized or mocked in tests.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +82 to +87
is_image_gen = server_args.pipeline_config.task_type.is_image_gen()
if (
width is not None
and height is not None
and (not server_based_warmup or is_image_gen)
):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The is_image_gen variable is evaluated unconditionally, even when server_based_warmup is False. When server_based_warmup is False, the condition not server_based_warmup is True, meaning is_image_gen is not needed to determine the outcome.\n\nEvaluating this unconditionally can lead to unnecessary attribute access and potential AttributeError or NoneType errors if server_args is mocked or partially initialized (e.g., in unit tests).\n\nWe can leverage Python's short-circuit evaluation to lazily evaluate is_image_gen only when server_based_warmup is True.

    if (\n        width is not None\n        and height is not None\n        and (not server_based_warmup or server_args.pipeline_config.task_type.is_image_gen())\n    ):
References
  1. Enforce defensive programming by ensuring appropriate guards exist before object property accesses, especially when inputs might be partially initialized or mocked in tests.

@mickqian

Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant