Increase of model moving time with each flux generation

I've noticed that with each new generation when using flux models, the model transfer time keeps getting longer and longer.


Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-644-gde1670a4
Commit hash: de1670a4f71c7a136f20cabba884d68f65c63b1f
Launching Web UI with arguments:
Total VRAM 8191 MB, total RAM 16335 MB
pytorch version: 2.4.0+cu124
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3050 : native
Hint: your device supports --cuda-malloc for potential speed improvements.
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: False
D:\AI\Forge\system\python\lib\site-packages\transformers\utils\hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Using pytorch cross attention
Using pytorch attention for VAE
ControlNet preprocessor location: D:\AI\Forge\webui\models\ControlNetPreprocessor
2025-02-13 20:27:02,860 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'D:\\AI\\Forge\\webui\\models\\Stable-diffusion\\flux1-schnell-bnb-nf4.safetensors', 'hash': '7d3d1873'}, 'additional_modules': ['D:\\AI\\Forge\\webui\\models\\text_encoder\\t5xxl_fp8_e4m3fn.safetensors', 'D:\\AI\\Forge\\webui\\models\\text_encoder\\clip_l.safetensors', 'D:\\AI\\Forge\\webui\\models\\VAE\\ae.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 78.8s (prepare environment: 19.2s, launcher: 1.8s, import torch: 37.3s, initialize shared: 1.8s, other imports: 2.0s, setup gfpgan: 0.2s, list SD models: 0.6s, load scripts: 9.0s, create ui: 3.5s, gradio launch: 3.9s).
Model selected: {'checkpoint_info': {'filename': 'D:\\AI\\Forge\\webui\\models\\Stable-diffusion\\flux1-dev-bnb-nf4-v2.safetensors', 'hash': 'f0770152'}, 'additional_modules': ['D:\\AI\\Forge\\webui\\models\\text_encoder\\t5xxl_fp8_e4m3fn.safetensors', 'D:\\AI\\Forge\\webui\\models\\text_encoder\\clip_l.safetensors', 'D:\\AI\\Forge\\webui\\models\\VAE\\ae.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Loading Model: {'checkpoint_info': {'filename': 'D:\\AI\\Forge\\webui\\models\\Stable-diffusion\\flux1-dev-bnb-nf4-v2.safetensors', 'hash': 'f0770152'}, 'additional_modules': ['D:\\AI\\Forge\\webui\\models\\text_encoder\\t5xxl_fp8_e4m3fn.safetensors', 'D:\\AI\\Forge\\webui\\models\\text_encoder\\clip_l.safetensors', 'D:\\AI\\Forge\\webui\\models\\VAE\\ae.safetensors'], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'transformer': 1722, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Using Detected UNet Type: nf4
Using pre-quant state dict!
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': 'nf4', 'computation_dtype': torch.bfloat16}
Model loaded in 22.3s (unload existing model: 0.2s, forge model load: 22.1s).
[LORA] Loaded D:\AI\Forge\webui\models\Lora\Anime_Furry_Style_Flux.safetensors for KModel-UNet with 304 keys at weight 0.7 (skipped 0 keys) with on_the_fly = False
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
[Unload] Trying to free 7723.54 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 7184.00 MB, Model Require: 5153.49 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 1006.51 MB, All loaded to GPU.
Moving model(s) has taken **24.04** seconds
Distilled CFG Scale: 3.5
[Unload] Trying to free 9411.13 MB for cuda:0 with 0 models keep loaded ... Current free memory is 1911.42 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 7144.03 MB, Model Require: 6246.84 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: -126.81 MB, CPU Swap Loaded (blocked method): 1435.50 MB, GPU Loaded: 4811.34 MB
Moving model(s) has taken **148.36** seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [01:12<00:00,  7.28s/it]
[Unload] Trying to free 4495.77 MB for cuda:0 with 0 models keep loaded ... Current free memory is 2125.69 MB ... Unload model KModel Done.
[Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 7134.06 MB, Model Require: 159.87 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 5950.19 MB, All loaded to GPU.
Moving model(s) has taken **54.55** seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 10/10 [02:20<00:00, 14.03s/it]
Environment vars changed: {'stream': True, 'inference_memory': 1024.0, 'pin_shared_memory': False}:20<00:00,  5.18s/it]
[GPU Setting] You will use 87.50% GPU memory (7167.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 87.50% GPU memory (7167.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Current free memory is 6974.41 MB ... Unload model IntegratedAutoencoderKL Done.
[LORA] Loaded D:\AI\Forge\webui\models\Lora\Anime_Furry_Style_Flux.safetensors for KModel-UNet with 304 keys at weight 0.9 (skipped 0 keys) with on_the_fly = False
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
[Unload] Trying to free 7817.77 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 7135.05 MB, Model Require: 5225.98 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 885.07 MB, All loaded to GPU.
Moving model(s) has taken **244.25** seconds
Distilled CFG Scale: 3.5
[Unload] Trying to free 9411.08 MB for cuda:0 with 0 models keep loaded ... Current free memory is 1900.58 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 7130.06 MB, Model Require: 6246.80 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: -140.74 MB, CPU Swap Loaded (blocked method): 1435.50 MB, GPU Loaded: 4811.30 MB
Moving model(s) has taken **687.08** seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [01:06<00:00,  6.69s/it]
[Unload] Trying to free 4495.77 MB for cuda:0 with 0 models keep loaded ... Current free memory is 2127.72 MB ... Unload model KModel Done.
[Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 7128.09 MB, Model Require: 159.87 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 5944.22 MB, All loaded to GPU.
Moving model(s) has taken **426.07** seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 10/10 [09:14<00:00, 55.42s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 10/10 [09:13<00:00,  5.14s/it]


I had a similar problem a few months ago (long models moving) and I solved it with the "gpu_for_t5" extension (https://github.com/Juqowel/GPU_For_T5) while putting cpu. However, after a while the problem resolved itself and the extension did not affect anything else.

I have now tried this extension again and it helped, however I want to find out what is the reason for such a big difference in speed, as I didn't find a direct answer or didn't understand it here: https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/1591?ysclid=m73j7ntnjk344262211

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase of model moving time with each flux generation #2652

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development