Add SDXL FBCache Support #787
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Feat: Implement FBCache (First Block Cache) Optimization for SDXL Pipeline
Overview
This PR introduces the FBCache (First Block Cache) optimization to the Stable Diffusion XL (SDXL) inference pipeline, accelerating inference by skipping redundant computations.
How FBCache Works
FBCache uses the U-Net's first down-sampling block (First Down Block) as this proxy.
The implementation of this forward pass references the design of the Hugging Face diffusers library to maintain consistency with its pipeline architecture.
Performance Improvement (Acceleration)
With this optimization, inference speed for a 50-step SDXL generation on an NVIDIA A5000 GPU was significantly improved.
Before (No Cache):$\approx$ 7 seconds
After (FBCache):$\approx$ 2 seconds
This represents an approximate 3.5x inference speedup. This result demonstrates that the cache effectively skips the majority of U-Net computations in the later timesteps. The attached images show identical quality output, confirming the optimization's effectiveness.