[diffusion] chore: improve memory usage on consumer-level GPU#18997
[diffusion] chore: improve memory usage on consumer-level GPU#18997
Conversation
Summary of ChangesHello @mickqian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on enhancing memory efficiency and performance, particularly for systems with consumer-grade GPUs. It introduces intelligent offloading mechanisms to prevent out-of-memory issues during model loading and execution, and automates parallelism settings for better resource utilization. Additionally, it improves debugging capabilities by logging GPU memory usage at critical stages. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request aims to improve memory usage on consumer-level GPUs by adjusting offloading strategies. The changes in transformer_loader.py and perf_logger.py are beneficial. However, I've identified a critical issue in server_args.py where conflicting settings can be enabled, and a separate maintainability issue due to code duplication. I have provided suggestions to address these points.
| if current_platform.get_device_total_memory() / BYTES_PER_GB < 30: | ||
| logger.info("Enabling all offloading for GPU with low device memory") | ||
| if self.dit_cpu_offload is None: | ||
| self.dit_cpu_offload = True | ||
| if self.text_encoder_cpu_offload is None: | ||
| self.text_encoder_cpu_offload = True | ||
| if self.image_encoder_cpu_offload is None: | ||
| self.image_encoder_cpu_offload = True | ||
| if self.vae_cpu_offload is None: | ||
| self.vae_cpu_offload = True |
There was a problem hiding this comment.
The added logic for low-memory GPUs is nearly identical to the else block on lines 438-447, which introduces code duplication. This can make future modifications more error-prone.
Additionally, the value 30 is a magic number. It would be better to define it as a constant, for example LOW_GPU_MEMORY_THRESHOLD_GB = 30, to improve readability and make it easier to change.
Please consider refactoring the _adjust_offload method to eliminate this duplication.
|
/tag-and-rerun-ci |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to automatically enable CPU offloading on GPUs with low memory, which is a good step towards improving usability on consumer-level hardware. My review uncovered a critical issue in the validation logic that could lead to conflicting configurations, and I've also provided a suggestion to refactor the new offloading logic for better maintainability.
| self.use_fsdp_inference = False | ||
|
|
||
| if self.dit_cpu_offload: | ||
| if self.dit_cpu_offload is None: |
There was a problem hiding this comment.
This change from if self.dit_cpu_offload: to if self.dit_cpu_offload is None: appears to introduce a bug. When dit_layerwise_offload is enabled, dit_cpu_offload must be disabled to prevent conflicts. However, with this change, if _adjust_offload sets dit_cpu_offload=True on a low-memory GPU, this condition will be false, and dit_cpu_offload will incorrectly remain True, leading to a configuration conflict.
The previous logic correctly handled this by disabling dit_cpu_offload whenever it was enabled. Please revert this change to ensure the conflict is always resolved.
| if self.dit_cpu_offload is None: | |
| if self.dit_cpu_offload: |
| if current_platform.get_device_total_memory() / BYTES_PER_GB < 30: | ||
| logger.info("Enabling all offloading for GPU with low device memory") | ||
| if self.dit_cpu_offload is None: | ||
| self.dit_cpu_offload = True | ||
| if self.text_encoder_cpu_offload is None: | ||
| self.text_encoder_cpu_offload = True | ||
| if self.image_encoder_cpu_offload is None: | ||
| self.image_encoder_cpu_offload = True | ||
| if self.vae_cpu_offload is None: | ||
| self.vae_cpu_offload = True |
There was a problem hiding this comment.
The hardcoded value 30 for the memory threshold can be extracted into a named constant to improve readability. Additionally, the series of if statements to enable offloading for different components is repetitive. This block can be refactored into a loop to make the code more concise and easier to maintain.
| if current_platform.get_device_total_memory() / BYTES_PER_GB < 30: | |
| logger.info("Enabling all offloading for GPU with low device memory") | |
| if self.dit_cpu_offload is None: | |
| self.dit_cpu_offload = True | |
| if self.text_encoder_cpu_offload is None: | |
| self.text_encoder_cpu_offload = True | |
| if self.image_encoder_cpu_offload is None: | |
| self.image_encoder_cpu_offload = True | |
| if self.vae_cpu_offload is None: | |
| self.vae_cpu_offload = True | |
| if current_platform.get_device_total_memory() / BYTES_PER_GB < 30: | |
| logger.info("Enabling all offloading for GPU with low device memory") | |
| offload_attrs = [ | |
| "dit_cpu_offload", | |
| "text_encoder_cpu_offload", | |
| "image_encoder_cpu_offload", | |
| "vae_cpu_offload", | |
| ] | |
| for attr in offload_attrs: | |
| if getattr(self, attr) is None: | |
| setattr(self, attr, True) |
Previously,
--dit-cpu-offloadis disabled automatically when--dit-layerwise-offloadis set, making Flux.1-dev unable to run on 4090Motivation
Set
--dit-cpu-offloadto false only when it is not explicit set.Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci