Implement TaylorSeer Lite for Flux text2image pipeline#3349
Implement TaylorSeer Lite for Flux text2image pipeline#3349l-bat wants to merge 5 commits intoopenvinotoolkit:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements TaylorSeer Lite caching optimization for the Flux text-to-image pipeline. TaylorSeer Lite uses Taylor series approximation to predict transformer outputs during denoising steps, reducing the number of expensive forward passes required. The implementation caches only the final layer output and its first derivative rather than all transformer layer features, making it memory-efficient while achieving significant speedups.
Changes:
- Added TaylorSeerCacheConfig class for configuring cache behavior (interval, warmup period, and shutdown period)
- Implemented TaylorSeerState class managing Taylor series factors and prediction logic
- Integrated TaylorSeer caching into FluxPipeline with conditional computation/prediction based on configuration
- Added Python bindings, comprehensive unit tests, sample applications (C++ and Python), and documentation
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/cpp/include/openvino/genai/taylorseer_config.hpp | Configuration class for TaylorSeer cache parameters |
| src/cpp/src/diffusion_caching/taylorseer_lite.hpp | Core TaylorSeerState implementation managing Taylor factors and predictions |
| src/cpp/src/image_generation/flux_pipeline.hpp | Integration of TaylorSeer caching into Flux pipeline denoising loop |
| src/cpp/src/image_generation/generation_config.cpp | Support for taylorseer_config in ImageGenerationConfig |
| src/cpp/include/openvino/genai/image_generation/generation_config.hpp | Added taylorseer_config optional member to ImageGenerationConfig |
| src/python/py_image_generation_pipelines.cpp | Python bindings for TaylorSeerCacheConfig class |
| src/python/openvino_genai/py_openvino_genai.pyi | Type stubs for Python bindings |
| src/python/openvino_genai/init.py | Exported TaylorSeerCacheConfig to Python package |
| tests/cpp/diffusion_caching.cpp | Comprehensive unit tests for config and state classes |
| samples/python/image_generation/taylorseer_text2image.py | Python sample demonstrating usage with performance comparison |
| samples/cpp/image_generation/taylorseer_text2image.cpp | C++ sample demonstrating usage with performance comparison |
| site/docs/concepts/optimization-techniques/diffusion-caching.md | Complete documentation explaining concept, usage, and limitations |
| samples/python/image_generation/README.md | Updated README with TaylorSeer sample information |
| samples/cpp/image_generation/README.md | Updated README with TaylorSeer sample information |
| samples/cpp/image_generation/CMakeLists.txt | Build configuration for C++ sample |
| samples/cpp/image_generation/taylorseer.bmp | Reference image generated with TaylorSeer |
| samples/cpp/image_generation/taylorseer_baseline.bmp | Reference baseline image without TaylorSeer |
|
Update PR's description |
|
|
||
| namespace ov::genai { | ||
|
|
||
| class TaylorSeerCacheConfig { |
There was a problem hiding this comment.
The TaylorSeerCacheConfig class should be marked with OPENVINO_GENAI_EXPORTS to ensure proper DLL visibility on Windows and consistent API export patterns with other public configuration classes in the codebase. All other public configuration classes like KVCrushConfig, CacheEvictionConfig, ImageGenerationConfig, etc. use this macro.
site/docs/concepts/optimization-techniques/diffusion-caching.md
Outdated
Show resolved
Hide resolved
9c6471e to
d9753b5
Compare
d9753b5 to
b9f4a7f
Compare
b9f4a7f to
0b482a7
Compare
0b482a7 to
868979b
Compare
| #!/usr/bin/env python3 | ||
| # Copyright (C) 2026 Intel Corporation | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| import argparse | ||
| import time | ||
|
|
||
| import openvino_genai | ||
| from PIL import Image | ||
|
|
||
|
|
||
| def main(): | ||
| parser = argparse.ArgumentParser(description="Text-to-image generation with TaylorSeer caching optimization") | ||
| parser.add_argument("model_dir", help="Path to the converted OpenVINO model directory") | ||
| parser.add_argument("prompt", help="Text prompt for image generation") | ||
| parser.add_argument("--steps", type=int, default=28, help="Number of inference steps") | ||
|
|
||
| ts_group = parser.add_argument_group("TaylorSeer Cache Configurations") | ||
| ts_group.add_argument("--cache-interval", type=int, default=3, help="Cache interval") | ||
| ts_group.add_argument("--disable-before", type=int, default=6, help="Disable caching before this step for warmup") | ||
| ts_group.add_argument( | ||
| "--disable-after", type=int, default=-2, help="Disable caching after this step from end, -1 means last step" | ||
| ) | ||
| args = parser.parse_args() | ||
|
|
||
| device = "CPU" # GPU can be used as well | ||
| pipe = openvino_genai.Text2ImagePipeline(args.model_dir, device) | ||
|
|
||
| def callback(step, num_steps, latent): | ||
| print(f"Step {step + 1}/{num_steps}") | ||
| return False | ||
|
|
||
| # Configure TaylorSeer caching | ||
| taylorseer_config = openvino_genai.TaylorSeerCacheConfig( | ||
| cache_interval=args.cache_interval, | ||
| disable_cache_before_step=args.disable_before, | ||
| disable_cache_after_step=args.disable_after, | ||
| ) | ||
| generation_config = pipe.get_generation_config() | ||
| generation_config.taylorseer_config = taylorseer_config | ||
| pipe.set_generation_config(generation_config) | ||
|
|
||
| print(f"TaylorSeer Configuration:") | ||
| print(f" Cache interval: {args.cache_interval}") | ||
| print(f" Disable before step: {args.disable_before}") | ||
| print(f" Disable after step: {args.disable_after}") | ||
|
|
||
| print(f"Generating image with TaylorSeer caching...") | ||
| generate_kwargs = { | ||
| "width": 512, | ||
| "height": 512, | ||
| "num_inference_steps": args.steps, | ||
| "rng_seed": 42, | ||
| "num_images_per_prompt": 1, | ||
| "callback": callback, | ||
| } | ||
|
|
||
| start_time = time.time() | ||
| image_tensor = pipe.generate(args.prompt, **generate_kwargs) | ||
| taylorseer_time = time.time() - start_time | ||
| print(f"TaylorSeer generation completed in {taylorseer_time:.2f}s") | ||
|
|
||
| image_filename = "taylorseer.bmp" | ||
| image = Image.fromarray(image_tensor.data[0]) | ||
| image.save(image_filename) | ||
| print(f"Image saved to {image_filename}") | ||
|
|
||
| print("\nGenerating baseline image without caching for comparison...") | ||
|
|
||
| # Disable TaylorSeer by removing the config | ||
| baseline_config = pipe.get_generation_config() | ||
| baseline_config.taylorseer_config = None | ||
| pipe.set_generation_config(baseline_config) | ||
|
|
||
| start_time = time.time() | ||
| baseline_tensor = pipe.generate(args.prompt, **generate_kwargs) | ||
| baseline_time = time.time() - start_time | ||
|
|
||
| print(f"Baseline generation completed in {baseline_time:.2f}s") | ||
|
|
||
| baseline_filename = image_filename.replace(".bmp", "_baseline.bmp") | ||
| baseline_image = Image.fromarray(baseline_tensor.data[0]) | ||
| baseline_image.save(baseline_filename) | ||
| print(f"Baseline image saved to {baseline_filename}") | ||
|
|
||
| # Performance comparison | ||
| speedup = baseline_time / taylorseer_time if taylorseer_time > 0 else 0.0 | ||
| time_saved = baseline_time - taylorseer_time if baseline_time > 0 else 0.0 | ||
| percentage = (baseline_time - taylorseer_time) / baseline_time * 100 if baseline_time > 0 else 0.0 | ||
|
|
||
| print(f"\nPerformance Comparison:") | ||
| print(f" Baseline time: {baseline_time:.2f}s") | ||
| print(f" TaylorSeer time: {taylorseer_time:.2f}s") | ||
| print(f" Speedup: {speedup:.2f}x") | ||
| print(f" Time saved: {time_saved:.2f}s ({percentage:.1f}%)") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
There was a problem hiding this comment.
The new sample taylorseer_text2image should have corresponding tests added. Looking at similar samples like text2image.py which has tests in tests/python_tests/samples/test_text2image.py, this sample should also have tests added to verify it runs correctly. The test should cover both Python and C++ samples and verify they can execute without errors.
868979b to
34b633c
Compare
Description
Adding TaylorSeer Lite Caching method to accelerate inference speed of Flux text to image generation pipeline. TaylorSeer Lite uses Taylor series approximation to predict transformer outputs during denoising steps, eliminating the need for full forward passes.
Example:
CVS-181247
Checklist: