-
-
Notifications
You must be signed in to change notification settings - Fork 166
Adds faster model loading args - RunPod loading takes at least 15-20 minutes without this and with this it takes 30 seconds #681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add fast_load parameter to load_transformer function in hunyuan_model/models.py - When fast_load=True, sets disable_mmap=False to load entire model to RAM without memory mapping - This speeds up model loading significantly, especially for block swap or LoRA training - Increases RAM usage but eliminates the slow memory mapping overhead - Add --fast_load argument to hv_train.py and hv_train_network.py command line parsers
…to load_qwen_image_model and propagate through loading chain - Add disable_mmap support to MemoryEfficientSafeOpen for fast loading - Update callers to pass args.fast_load - Add informative log message when fast_load is enabled
…e load_transformer method in qwen_image_train.py to support fast_load - Add fast_load log message when enabled - Pass disable_mmap to load_safetensors and MemoryEfficientSafeOpen based on fast_load flag
|
Hey @kohya-ss please add this as well. I would recommend add this universall because it makes massive huge speed difference on slow disk systems with big RAM |
|
Thank you for this! I think what you're saying is that Numpy's The option name Also, the variable name |
…load argument to --disable_numpy_memmap to be more descriptive - Rename variable disable_mmap to disable_numpy_memmap where it controls numpy memmap - Rename variable to safetensors_disable_mmap where it controls Safetensors library - Update all log messages to reflect new naming - Clarifies that this disables numpy memory mapping specifically, not Safetensors memory mapping
|
@kohya-ss sure just made changes i hope it is good now edit : i also modified docs |
…w --disable_numpy_memmap option in qwen_image.md - Document the new --disable_numpy_memmap option in hunyuan_video.md - Add English and Japanese descriptions - Explain benefits for block swap and LoRA training - Include note about RAM usage increase
|
Thanks for the update! I'll format the code with ruff after the merge, but I'd appreciate it if you could format it from next time. |
|
After merging, I noticed many flaws... It was my mistake. HunyuanVideo doesn't use Numpy's memmap in the first place, so it was inappropriate to apply it there. Also, this argument was ignored in many places. I'll review more carefully next time😅 |
|
I have made some fixes in #687 and merged it, so I would appreciate it if you could check that it works. |
Without this literally taking 15 to 20 minute to load Qwen Dit model
Unberable on RunPod
They have massive RAM but extremely slow Disk