[Feat] support TeaCache for Flux2 klein#1234
[Feat] support TeaCache for Flux2 klein#1234RuixiangMa wants to merge 2 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c52ff57eaf
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Lancer <maruixiang6688@gmail.com>
c52ff57 to
979fe87
Compare
There was a problem hiding this comment.
Pull request overview
This pull request adds TeaCache support for the Flux2 Klein diffusion model, which uses a dual-stream transformer architecture. TeaCache is an adaptive caching technique that speeds up inference by reusing transformer computations when timestep embeddings are similar.
Changes:
- Added Flux2Klein extractor function to extract model-specific context for caching decisions
- Added Flux2Klein-specific logic in the hook to handle the model's dual-stream architecture with additional single_transformer_blocks
- Configured polynomial coefficients for Flux2Klein (reusing FLUX.1 coefficients)
- Registered Flux2KleinPipeline in the TeaCache backend with a custom enabler
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| vllm_omni/diffusion/cache/teacache/hook.py | Added special handling for Flux2 models that have both transformer_blocks and single_transformer_blocks |
| vllm_omni/diffusion/cache/teacache/extractors.py | Implemented extract_flux2_klein_context function with preprocessing, transformer execution, and postprocessing logic |
| vllm_omni/diffusion/cache/teacache/config.py | Added Flux2Klein polynomial coefficients (borrowed from FLUX.1) |
| vllm_omni/diffusion/cache/teacache/backend.py | Added enable_flux2_klein_teacache function and registered it in CUSTOM_TEACACHE_ENABLERS |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1cb433d to
ca90ff0
Compare
Purpose
Add TeaCache support for Flux2 Klein model with dual-stream transformer architecture.
Test Plan
Test Result
1 * NVIDIA 4090 (24G)
curl -s -X POST "http://localhost:8004/v1/images/edits" -F "image=@test.jpg" -F "prompt=Change the sky to orange sunset." -F "guidance_scale=1.0" -F "num_inference_steps=50" -F "n=1" -F "size=1024x1024" -F "output_format=png" | jq -r '.data[0].b64_json' | base64 --decode > output.png