Release v4.3.5 - boogu-image, prompt2effect, ideogram4 bugfixes, qwen_image compile improvement · bghira/SimpleTuner

Features

Boogu Image v0.1 - a 10B transformer model that trains very quickly (1 it/sec) especially on H100 with FA3 and torch regional compile
- Currently only supports basic training, not CREPA or LayerSync, TwinFlow etc
- Comes with high-throughput training when torch regional compile is enabled
Ideogram4 - validated Apple Silicon training support
SDNQ quant levels enabled for training
Qwen Image - removed use of complex tensors for improved torch compile support, especially on H100 with FA3
Prompt2Effect - an implementation of the paper from Snapchat that likely explains how they make video I2V effect LoRAs so readily

Single-file export for SD/SDXL and Transformers v5.6 or greater no longer crashes
Diffusers' double-shift for Euler flow matching scheduler (impacted mostly "unpopular" models)
WebUI training launch failure resolved (#2772)

ideogram fp8 fix for flowmap tensor on meta device by @bghira in #2757
Fix single-file SD/SDXL export with flattened CLIPTextModel (transformers >=5.6) by @ArthurZucker in #2759
diffusers: fix sigma bounds after initialising scheduler to prevent double-shift by @bghira in #2760
prompt2effect: train a hypernetwork from a collection of effect LoRAs by @bghira in #2761
merge by @bghira in #2762
boogu-image v0.1 by @bghira in #2763
ideogram4: support MacOS training by @bghira in #2764
ideogram4: document apple-specific quant via int8-sdnq by @bghira in #2766
sdnq: enable int8 training by @bghira in #2765
boogu-image: remove broken fp8 paths and use torchao on-the-fly quant instead by @bghira in #2767
Avoid complex Boogu rotary ops by @bghira in #2769
qwen_image: improved torch compile performance by @bghira in #2770
qwen_image: remove deprecation notice by @bghira in #2771
Fix training launch sanitization import and add regression coverage by @bghira with @Copilot in #2772
merge by @bghira in #2773

Full Changelog: v4.3.4...v4.3.5