Skip to content

[Feat] support TeaCache for Flux2 klein#1234

Open
RuixiangMa wants to merge 2 commits intovllm-project:mainfrom
RuixiangMa:supportTeaCache
Open

[Feat] support TeaCache for Flux2 klein#1234
RuixiangMa wants to merge 2 commits intovllm-project:mainfrom
RuixiangMa:supportTeaCache

Conversation

@RuixiangMa
Copy link
Contributor

@RuixiangMa RuixiangMa commented Feb 6, 2026

Purpose

Add TeaCache support for Flux2 Klein model with dual-stream transformer architecture.

Test Plan

Test Result

1 * NVIDIA 4090 (24G)

curl -s -X POST "http://localhost:8004/v1/images/edits" -F "image=@test.jpg" -F "prompt=Change the sky to orange sunset." -F "guidance_scale=1.0" -F "num_inference_steps=50" -F "n=1" -F "size=1024x1024" -F "output_format=png" | jq -r '.data[0].b64_json' | base64 --decode > output.png

Origin image NO TeaCache TeaCache (0.2) TeaCache (0.4) TeaCache (0.6)
Time 25.479 s/img 16.043 s/img 10.051 s/img 7.599 s/img

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c52ff57eaf

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Lancer <maruixiang6688@gmail.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds TeaCache support for the Flux2 Klein diffusion model, which uses a dual-stream transformer architecture. TeaCache is an adaptive caching technique that speeds up inference by reusing transformer computations when timestep embeddings are similar.

Changes:

  • Added Flux2Klein extractor function to extract model-specific context for caching decisions
  • Added Flux2Klein-specific logic in the hook to handle the model's dual-stream architecture with additional single_transformer_blocks
  • Configured polynomial coefficients for Flux2Klein (reusing FLUX.1 coefficients)
  • Registered Flux2KleinPipeline in the TeaCache backend with a custom enabler

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
vllm_omni/diffusion/cache/teacache/hook.py Added special handling for Flux2 models that have both transformer_blocks and single_transformer_blocks
vllm_omni/diffusion/cache/teacache/extractors.py Implemented extract_flux2_klein_context function with preprocessing, transformer execution, and postprocessing logic
vllm_omni/diffusion/cache/teacache/config.py Added Flux2Klein polynomial coefficients (borrowed from FLUX.1)
vllm_omni/diffusion/cache/teacache/backend.py Added enable_flux2_klein_teacache function and registered it in CUSTOM_TEACACHE_ENABLERS

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Lancer <maruixiang6688@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant