Skip to content

[feature]: support flux2.klein cache_dit#1209

Open
nuclearwu wants to merge 3 commits intovllm-project:mainfrom
nuclearwu:klein
Open

[feature]: support flux2.klein cache_dit#1209
nuclearwu wants to merge 3 commits intovllm-project:mainfrom
nuclearwu:klein

Conversation

@nuclearwu
Copy link
Contributor

@nuclearwu nuclearwu commented Feb 5, 2026

Signed-off-by: wuzhongjian wuzhongjian_yewu@cmss.chinamobile.com

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

support flux2.klein cache_dit

Test Plan

python examples/offline_inference/text_to_image/text_to_image.py \
  --model /workspace/cache/ymttest/johnjan/models/black-forest-labs/FLUX___2-klein-4B/ \
  --prompt "A cat holding a sign that says hello world" \
  --seed 42 \
  --tensor_parallel_size 1 \
  --num_images_per_prompt 1 \
  --num_inference_steps 50 \
  --guidance_scale 4.0 \
  --cache_backend cache_dit \
  --height 1024\
  --width 1024\
  --output outputs/flux-klein.png

Test Result

vLLM-Omni:
Reproduced with 4xA800.

Model/TP TP=1 TP=2 TP=4 cache_dit
FLUX.2-klein-4B e4cbf16128869c185032123b560f913a e4cbf16128869c185032123b560f913a e4cbf16128869c185032123b560f913a
Time 14.9125s/img 12.9658s/img 10.8902s/img no
Time 10.3670s/img 9.2590s/img 7.5931s/img yes

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 656f2b142f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>
@nuclearwu
Copy link
Contributor Author

cc @hsliuustc0106 @ZJY0516

Copy link
Collaborator

@ZJY0516 ZJY0516 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But also cc @SamitHuang

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds cache-dit acceleration support for the FLUX.2-klein-4B diffusion model. The implementation follows established patterns in the codebase and enables significant performance improvements (1.4-1.5x speedup) when cache_dit is enabled.

Changes:

  • Added enable_cache_for_flux2_klein function to enable cache-dit for FLUX.2-klein-4B pipeline with model-specific configuration (Fn_compute_blocks=2, forward patterns Pattern_1 and Pattern_2)
  • Registered the new pipeline in CUSTOM_DIT_ENABLERS dictionary
  • Updated documentation to list FLUX.2-klein as supported by cache-dit acceleration

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
vllm_omni/diffusion/cache/cache_dit_backend.py Adds cache-dit enabler function for FLUX.2-klein with model-specific configuration and registers it in the pipeline map
docs/user_guide/diffusion_acceleration.md Updates supported models table to include FLUX.2-klein with cache-dit support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants