Skip to content
Draft
Show file tree
Hide file tree
Changes from 46 commits
Commits
Show all changes
768 commits
Select commit Hold shift + click to select a range
936d333
Merge pull request #1985 from gesen2egee/pytorch-optimizer
kohya-ss Mar 20, 2025
d151833
docs: update README with recent changes and specify version for pytor…
kohya-ss Mar 20, 2025
16cef81
Refactor sigmas and timesteps
rockerBOO Mar 20, 2025
e8b3254
Add flux_train_utils tests for get get_noisy_model_input_and_timesteps
rockerBOO Mar 20, 2025
8aa1265
Scale sigmoid to default 1.0
rockerBOO Mar 20, 2025
d40f5b1
Revert "Scale sigmoid to default 1.0"
rockerBOO Mar 20, 2025
89f0d27
Set sigmoid_scale to default 1.0
rockerBOO Mar 20, 2025
6364379
Merge branch 'dev' into sd3
kohya-ss Mar 21, 2025
2ba1cc7
Fix max norms not applying to noise
rockerBOO Mar 22, 2025
61f7283
Fix non-cache vae encode
rockerBOO Mar 22, 2025
1481217
Merge pull request #25 from rockerBOO/lumina-fix-non-cache-image-vae-…
sdbds Mar 22, 2025
3000816
Merge pull request #24 from rockerBOO/lumina-fix-max-norms
sdbds Mar 22, 2025
8ebe858
Merge branch 'dev' into sd3
kohya-ss Mar 24, 2025
e64dc05
Supplement the input parameters to correctly convert the flux model t…
laolongboy Mar 24, 2025
182544d
Remove pertubation seed
rockerBOO Mar 26, 2025
0181b7a
Remove progress bar avg norms
rockerBOO Mar 27, 2025
93a4efa
Merge branch 'sd3' into resize-interpolation
kohya-ss Mar 30, 2025
9e9a13a
Merge pull request #1936 from rockerBOO/resize-interpolation
kohya-ss Mar 30, 2025
1f432e2
use PIL for lanczos and box
kohya-ss Mar 30, 2025
96a133c
README.md: update recent updates section to include new interpolation…
kohya-ss Mar 30, 2025
3149b27
Merge pull request #2018 from kohya-ss/resize-interpolation-small-fix
kohya-ss Mar 30, 2025
59d98e4
Merge pull request #1974 from rockerBOO/lora-ggpo
kohya-ss Mar 30, 2025
d0b5c0e
chore: formatting, add TODO comment
kohya-ss Mar 30, 2025
aaa26bb
docs: update README to include LoRA-GGPO details for FLUX.1 training
kohya-ss Mar 30, 2025
b3c56b2
Merge branch 'dev' into sd3
kohya-ss Mar 31, 2025
ede3470
Ensure all size parameters are integers to prevent type errors
LexSong Apr 1, 2025
b822b7e
Fix the interpolation logic error in resize_image()
LexSong Apr 1, 2025
f1423a7
fix: add resize_interpolation parameter to FineTuningDataset constructor
kohya-ss Apr 3, 2025
92845e8
Merge pull request #2026 from kohya-ss/fix-finetune-dataset-resize-in…
kohya-ss Apr 3, 2025
fd36fd1
Fix resize PR link
rockerBOO Apr 3, 2025
606e687
Merge pull request #2022 from LexSong/fix-resize-issue
kohya-ss Apr 5, 2025
ee0f754
Merge pull request #2028 from rockerBOO/patch-5
kohya-ss Apr 5, 2025
00e12ee
update for lost change
sdbds Apr 6, 2025
1a4f1ff
Merge branch 'lumina' of https://github.com/sdbds/sd-scripts into lumina
sdbds Apr 6, 2025
9f1892c
Merge branch 'sd3' into lumina
sdbds Apr 6, 2025
7f93e21
fix typo
sdbds Apr 6, 2025
c56dc90
Merge pull request #1992 from rockerBOO/flux-ip-noise-gamma
kohya-ss Apr 6, 2025
4589262
README.md: Update recent updates section to include IP noise gamma fe…
kohya-ss Apr 6, 2025
5a18a03
Merge branch 'dev' into sd3
kohya-ss Apr 7, 2025
8f5a2eb
Add documentation for LoRA training scripts for SD1/2, SDXL, FLUX.1 a…
kohya-ss Apr 10, 2025
ceb19be
update docs. sdxl is transltaed, flux.1 is corrected
kohya-ss Apr 13, 2025
b1bbd45
doc: update sd3 LoRA, sdxl LoRA advanced
kohya-ss Apr 14, 2025
176baa6
doc: update sd3 and sdxl training guides
kohya-ss Apr 16, 2025
06df037
Merge branch 'sd3' into flux-sample-cfg
kohya-ss Apr 16, 2025
629073c
Add guidance scale for prompt param and flux sampling
kohya-ss Apr 16, 2025
7c61c0d
Add autocast warpper for forward functions in deepspeed_utils.py to t…
sharlynxy Apr 22, 2025
d33d5ec
#
sharlynxy Apr 22, 2025
7f984f4
#
sharlynxy Apr 22, 2025
c8af252
refactor
Apr 22, 2025
f501209
Merge branch 'dev/xy/align_dtype_using_mixed_precision' of github.com…
Apr 22, 2025
0d9da0e
Merge pull request #1 from saibit-tech/dev/xy/align_dtype_using_mixed…
sharlynxy Apr 22, 2025
b11c053
Merge branch 'dev' into sd3
kohya-ss Apr 22, 2025
899f345
update for init problem
sdbds Apr 23, 2025
4fc9178
fix bugs
sdbds Apr 23, 2025
adb775c
Update: requirement diffusers[torch]==0.25.0
sharlynxy Apr 23, 2025
abf2c44
Dynamically set device in deepspeed wrapper (#2)
sharlynxy Apr 23, 2025
46ad3be
update deepspeed wrapper
sharlynxy Apr 24, 2025
5c50cdb
Merge branch 'sd3' into flux-sample-cfg
kohya-ss Apr 27, 2025
8387e0b
docs: update README to include CFG scale support in FLUX.1 training
kohya-ss Apr 27, 2025
309c44b
Merge pull request #2064 from kohya-ss/flux-sample-cfg
kohya-ss Apr 27, 2025
0e8ac43
Merge branch 'dev' into sd3
kohya-ss Apr 27, 2025
13296ae
Merge branch 'sd3' of https://github.com/kohya-ss/sd-scripts into sd3
kohya-ss Apr 27, 2025
fd3a445
fix: revert default emb guidance scale and CFG scale for FLUX.1 sampling
kohya-ss Apr 27, 2025
29523c9
docs: add note for user feedback on CFG scale in FLUX.1 training
kohya-ss Apr 27, 2025
80320d2
Merge pull request #2066 from kohya-ss/quick-fix-flux-sampling-scales
kohya-ss Apr 27, 2025
64430eb
Merge branch 'dev' into sd3
kohya-ss Apr 29, 2025
1684aba
remove deepspeed from requirements.txt
sharlynxy Apr 30, 2025
a4fae93
Add pythonpath to pytest.ini
rockerBOO May 1, 2025
f62c68d
Make grad_norm and combined_grad_norm None is not recording
rockerBOO May 1, 2025
b4a89c3
Fix None
rockerBOO May 1, 2025
7c075a9
Merge pull request #2060 from saibit-tech/sd3
kohya-ss May 1, 2025
865c8d5
README.md: Update recent updates and add DeepSpeed installation instr…
kohya-ss May 1, 2025
a27ace7
doc: add DeepSpeed installation in header section
kohya-ss May 1, 2025
e858132
Merge pull request #2074 from kohya-ss/deepspeed-readme
kohya-ss May 1, 2025
e2ed265
Merge pull request #2072 from rockerBOO/pytest-pythonpath
kohya-ss May 1, 2025
f344df0
Merge branch 'sd3' into update-docs
kohya-ss May 2, 2025
5b38d07
Merge pull request #2073 from rockerBOO/fix-mean-grad-norms
kohya-ss May 11, 2025
2982197
Merge branch 'sd3' into update-docs
kohya-ss May 17, 2025
19a180f
Add English versions with Japanese in details
kohya-ss May 17, 2025
c5fb5ec
Merge pull request #2086 from kohya-ss/codex/translate-and-structure-…
kohya-ss May 17, 2025
08aed00
doc: update FLUX.1 for newer features from README.md
kohya-ss May 17, 2025
e7e371c
doc: update English translation for advanced SDXL LoRA training
kohya-ss May 17, 2025
2bfda12
Update workflows to read-all instead of write-all
rockerBOO May 20, 2025
5753b8f
Merge pull request #2088 from rockerBOO/checkov-update
kohya-ss May 20, 2025
a376fec
doc: add comprehensive README for image generation script with usage …
kohya-ss May 24, 2025
e4d6923
Add tests for syntax checking training scripts
rockerBOO Jun 3, 2025
61eda76
Merge pull request #2108 from rockerBOO/syntax-test
kohya-ss Jun 4, 2025
bb47f1e
Fix unwrap_model handling for None text_encoders in sample_images fun…
kohya-ss Jun 8, 2025
0145efc
Merge branch 'sd3' into lumina
rockerBOO Jun 9, 2025
d94bed6
Add lumina tests and fix image masks
rockerBOO Jun 10, 2025
77dbabe
Merge pull request #26 from rockerBOO/lumina-test-fix-mask
sdbds Jun 10, 2025
fc40a27
Merge branch 'dev' into sd3
kohya-ss Jun 15, 2025
3e6935a
Merge pull request #2115 from kohya-ss/fix-flux-sampling-accelerate-e…
kohya-ss Jun 15, 2025
1db7855
Merge branch 'sd3' into update-sd3
rockerBOO Jun 16, 2025
0e929f9
Revert system_prompt for dataset config
rockerBOO Jun 16, 2025
8e4dc1f
Merge pull request #28 from rockerBOO/lumina-train_util
sdbds Jun 17, 2025
52d1337
Merge pull request #1927 from sdbds/lumina
kohya-ss Jun 29, 2025
935e003
feat: update lumina system prompt handling
kohya-ss Jun 29, 2025
884c1f3
fix: update to work with cache text encoder outputs (without disk)
kohya-ss Jun 29, 2025
5034c6f
feat: add workaround for 'gated repo' error on github actions
kohya-ss Jun 29, 2025
078ee28
feat: add more workaround for 'gated repo' error on github actions
kohya-ss Jun 29, 2025
6731d8a
fix: update system prompt handling
kohya-ss Jun 29, 2025
05f392f
feat: add minimum inference code for Lumina with image generation cap…
kohya-ss Jul 3, 2025
a87e999
Change to 3
rockerBOO Jul 7, 2025
2fffcb6
Merge pull request #2146 from rockerBOO/lumina-typo
kohya-ss Jul 8, 2025
b4d1152
fix: sample generation with system prompt, without TE output caching
kohya-ss Jul 9, 2025
7fb0d30
feat: add LoRA support for lumina minimal inference
kohya-ss Jul 9, 2025
3f9eab4
fix: update default values in lumina minimal inference as same as sam…
kohya-ss Jul 9, 2025
7bd9a6b
Add prompt guidance files for Claude and Gemini, and update README fo…
kohya-ss Jul 10, 2025
9a50c96
Merge pull request #2147 from kohya-ss/ai-coding-agent-prompt
kohya-ss Jul 10, 2025
0b90555
feat: add .claude and .gemini to .gitignore
kohya-ss Jul 10, 2025
2e0fcc5
Merge pull request #2148 from kohya-ss/update-gitignore-for-claude-an…
kohya-ss Jul 10, 2025
4e7dfc0
Merge branch 'sd3' into feature-lumina-image
kohya-ss Jul 10, 2025
2a53524
Merge branch 'sd3' into update-docs
kohya-ss Jul 10, 2025
d0b335d
feat: add LoRA training guide for Lumina Image 2.0 (WIP)
kohya-ss Jul 10, 2025
8a72f56
fix: clarify Flash Attention usage in lumina training guide
kohya-ss Jul 11, 2025
1a9bf2a
feat: add interactive mode for generating multiple images
kohya-ss Jul 13, 2025
88dc321
fix: support LoRA w/o TE for create_network_from_weights
kohya-ss Jul 13, 2025
88960e6
doc: update lumina LoRA training guide
kohya-ss Jul 13, 2025
999df5e
fix: update default values for timestep_sampling and model_prediction…
kohya-ss Jul 13, 2025
30295c9
fix: update parameter names for CFG truncate and Renorm CFG in docume…
kohya-ss Jul 13, 2025
13ccfc3
fix: update flow matching loss and variable names
kohya-ss Jul 13, 2025
a96d684
feat: add Chroma model implementation
kohya-ss Jul 15, 2025
e0fcb51
feat: support Neta Lumina all-in-one weights
kohya-ss Jul 15, 2025
25771a5
fix: update help text for cfg_trunc_ratio argument
kohya-ss Jul 15, 2025
c0c36a4
fix: remove duplicated latent normalization in decoding
kohya-ss Jul 15, 2025
a7b33f3
Fix alphas cumprod after add_noise for DDIMScheduler
rockerBOO Jul 16, 2025
3adbbb6
Add note about why we are moving it
rockerBOO Jul 16, 2025
d53a532
Merge pull request #2153 from rockerBOO/fix-alphas-cumprod
kohya-ss Jul 17, 2025
24d2ea8
feat: support Chroma model in loading and inference processes
kohya-ss Jul 20, 2025
404ddb0
fix: inference for Chroma model
kohya-ss Jul 20, 2025
8fd0b12
feat: update DoubleStreamBlock and SingleStreamBlock to handle text s…
kohya-ss Jul 20, 2025
c4958b5
feat: change img/txt order for attention and single blocks
kohya-ss Jul 20, 2025
b4e8626
feat: add LoRA training support for Chroma
kohya-ss Jul 20, 2025
0b763ef
feat: fix timestep for input_vec for Chroma
kohya-ss Jul 20, 2025
77a160d
fix: skip LoRA creation for None text encoders (CLIP-L for Chroma)
kohya-ss Jul 20, 2025
aec7e16
feat: add an option to add system prompt for negative in lumina infer…
kohya-ss Jul 21, 2025
d300f19
docs: update Lumina training guide to include inference script and op…
kohya-ss Jul 21, 2025
518545b
docs: add support information for Lumina-Image 2.0 in recent updates
kohya-ss Jul 21, 2025
d98400b
Merge pull request #2138 from kohya-ss/feature-lumina-image
kohya-ss Jul 21, 2025
9eda938
Merge branch 'sd3' into feature-chroma-support
kohya-ss Jul 21, 2025
7de68c1
Merge branch 'sd3' into update-docs
kohya-ss Jul 21, 2025
c84a163
docs: update README for documentation
kohya-ss Jul 21, 2025
4987057
Merge pull request #2042 from kohya-ss/update-docs
kohya-ss Jul 21, 2025
eef0550
Merge branch 'sd3' into feature-chroma-support
kohya-ss Jul 21, 2025
32f0601
doc: update flux train document and add about breaking changes in sam…
kohya-ss Jul 21, 2025
c28e7a4
feat: add regex-based rank and learning rate configuration for FLUX.1…
kohya-ss Jul 26, 2025
af14eab
doc: update section number for regex-based rank and learning rate con…
kohya-ss Jul 26, 2025
6c8973c
doc: add reference link for input vector gradient requirement in Chro…
kohya-ss Jul 28, 2025
10de781
build(deps): pytorch-optimizer to 3.7.0
kozistr Jul 28, 2025
450630c
fix: create network from weights not working
kohya-ss Jul 29, 2025
96feb61
feat: implement modulation vector extraction for Chroma and update re…
kohya-ss Jul 30, 2025
250f0eb
doc: update README and training guide with breaking changes for CFG s…
kohya-ss Jul 30, 2025
5dff02a
Merge pull request #2157 from kohya-ss/feature-chroma-support
kohya-ss Jul 30, 2025
bd6418a
fix: add assertion for apply_t5_attn_mask requirement in Chroma
kohya-ss Aug 1, 2025
aebfea2
Merge pull request #2165 from kohya-ss/force_t5_attn_mask_for_chroma_…
kohya-ss Aug 1, 2025
75dd8c8
Merge pull request #2160 from kozistr/update/pytorch-optimizer
kohya-ss Aug 1, 2025
5249732
chore: update README to include `--apply_t5_attn_mask` requirement fo…
kohya-ss Aug 1, 2025
b9c091e
Fix validation documentation
rockerBOO Aug 2, 2025
24c605e
Update flux_train_network.md
rockerBOO Aug 2, 2025
0ad2cb8
Update flux_train_network.md
rockerBOO Aug 2, 2025
d24d733
Update model spec to 1.0.1. Refactor model spec
rockerBOO Aug 3, 2025
056472c
Add tests
rockerBOO Aug 3, 2025
bf0f86e
Add sai_model_spec to train_network.py
rockerBOO Aug 3, 2025
10bfcb9
Remove text model spec
rockerBOO Aug 3, 2025
9bb50c2
Set sai_model_spec to must
rockerBOO Aug 3, 2025
c149cf2
Add parser args for other trainers.
rockerBOO Aug 3, 2025
a125c10
Merge pull request #2167 from rockerBOO/patch-6
kohya-ss Aug 12, 2025
dcce057
Merge pull request #2168 from rockerBOO/model-spec-1.0.1
kohya-ss Aug 13, 2025
351bed9
fix model type handling in analyze_state_dict_state function for SD3
kohya-ss Aug 13, 2025
f25c265
Merge pull request #2174 from kohya-ss/fix-modelspec-sd3-finetune
kohya-ss Aug 13, 2025
18e6251
Merge branch 'dev' into sd3
kohya-ss Aug 15, 2025
6edbe00
feat: update libraries, remove warnings
kohya-ss Aug 16, 2025
6f24bce
fix: remove unnecessary super call in assert_extra_args method
kohya-ss Aug 16, 2025
f61c442
fix: use strategy for tokenizer and latent caching
kohya-ss Aug 16, 2025
acba279
fix: update PyTorch version in workflow matrix
kohya-ss Aug 24, 2025
4b12746
Merge branch 'dev' into sd3
kohya-ss Aug 24, 2025
f7acd2f
fix: consolidate PyTorch versions in workflow matrix
kohya-ss Aug 24, 2025
ac72cf8
feat: remove bitsandbytes version specification in requirements.txt
kohya-ss Aug 27, 2025
c52c45c
doc: update for PyTorch and libraries versions
kohya-ss Aug 27, 2025
5a5138d
doc: add PR reference for PyTorch and library versions update
kohya-ss Aug 27, 2025
8cadec6
Merge pull request #2178 from kohya-ss/update-libraries
kohya-ss Aug 27, 2025
e836b7f
fix: chroma LoRA training without Text Encode caching
kohya-ss Aug 30, 2025
884e07d
Merge pull request #2191 from kohya-ss/fix-chroma-training-withtout-t…
kohya-ss Aug 30, 2025
989448a
doc: enhance SD3/SDXL LoRA training guide
kohya-ss Aug 31, 2025
fe81d40
doc: refactor structure for improved readability and maintainability
kohya-ss Aug 31, 2025
8071013
doc: add Sage Attention and sample batch size options to Lumina train…
kohya-ss Aug 31, 2025
c38b07d
doc: add validation loss documentation for model training
kohya-ss Aug 31, 2025
142d0be
doc: add comprehensive fine-tuning guide for various model architectures
kohya-ss Sep 1, 2025
9984868
doc: update README to include support for SDXL models and additional …
kohya-ss Sep 1, 2025
6c82327
doc: remove Japanese section on Gradual Latent options from gen_img R…
kohya-ss Sep 1, 2025
ddfb38e
doc: add documentation for Textual Inversion training scripts
kohya-ss Sep 4, 2025
884fc8c
doc: remove SD3/FLUX.1 training guide
kohya-ss Sep 4, 2025
952f9ce
Update docs/train_textual_inversion.md
kohya-ss Sep 4, 2025
0bb0d91
doc: update introduction and clarify command line option priorities i…
kohya-ss Sep 6, 2025
ef43979
Fix validation dataset documentation to not use subsets
rockerBOO Sep 8, 2025
78685b9
Move general settings to top to make more clear the validation bits
rockerBOO Sep 8, 2025
fe4c189
blocks_to_swap is supported for validation loss now
rockerBOO Sep 8, 2025
f833772
Merge pull request #2196 from rockerBOO/validation-dataset-subset
kohya-ss Sep 9, 2025
ee8e670
Merge branch 'sd3' into doc-update-for-latest-features
kohya-ss Sep 9, 2025
5149be5
feat: initial commit for HunyuanImage-2.1 inference
kohya-ss Sep 11, 2025
7f983c5
feat: block swap for inference and initial impl for HunyuanImage LoRA…
kohya-ss Sep 11, 2025
a0f0afb
fix: revert constructor signature update
kohya-ss Sep 11, 2025
cbc9e1a
feat: add byt5 to the list of recognized words in typos configuration
kohya-ss Sep 11, 2025
419a9c4
Merge pull request #2192 from kohya-ss/doc-update-for-latest-features
kohya-ss Sep 12, 2025
209c02d
feat: HunyuanImage LoRA training
kohya-ss Sep 12, 2025
aa0af24
Merge branch 'sd3' into feat-hunyuan-image-2.1-inference
kohya-ss Sep 12, 2025
7a651ef
feat: add 'tak' to recognized words and update block swap method to s…
kohya-ss Sep 12, 2025
9a61d61
feat: avoid unet type casting when fp8_scaled
kohya-ss Sep 12, 2025
8783f8a
feat: faster safetensors load and split safetensor utils
kohya-ss Sep 13, 2025
e1c666e
Update library/safetensors_utils.py
kohya-ss Sep 13, 2025
4568631
docs: update README to reflect improved loading speed of .safetensors…
kohya-ss Sep 13, 2025
f5d44fd
Merge pull request #2200 from kohya-ss/feat-faster-safetensors-load
kohya-ss Sep 13, 2025
bae7fa7
Merge branch 'sd3' into feat-hunyuan-image-2.1-inference
kohya-ss Sep 13, 2025
d831c88
fix: sample generation doesn't work with block swap
kohya-ss Sep 13, 2025
4e2a80a
refactor: update imports to use safetensors_utils for memory-efficien…
kohya-ss Sep 13, 2025
29b0500
fix: restore files section in _typos.toml for exclusion configuration
kohya-ss Sep 13, 2025
e04b9f0
docs: add LoRA training guide for HunyuanImage-2.1 model (by Gemini CLI)
kohya-ss Sep 13, 2025
1a73b5e
feat: add script to convert LoRA format to ComfyUI format
kohya-ss Sep 14, 2025
2732be0
Merge branch 'feat-hunyuan-image-2.1-inference' of https://github.com…
kohya-ss Sep 14, 2025
39458ec
fix: update default values for guidance_scale, image_size, infer_step…
kohya-ss Sep 16, 2025
f318dda
docs: update HunyuanImage-2.1 training guide with model download inst…
kohya-ss Sep 16, 2025
cbe2a9d
feat: add conversion script for LoRA models to ComfyUI format with re…
kohya-ss Sep 16, 2025
f5b0040
fix: correct tensor indexing in HunyuanVAE2D class for blending and e…
kohya-ss Sep 17, 2025
2ce506e
fix: fp8 casting not working
kohya-ss Sep 18, 2025
f6b4bdc
feat: block-wise fp8 quantization
kohya-ss Sep 18, 2025
f834b2e
fix: --fp8_vl to work
kohya-ss Sep 18, 2025
b090d15
feat: add multi backend attention and related update for HI2.1 models…
kohya-ss Sep 20, 2025
8f20c37
feat: add --text_encoder_cpu option to reduce VRAM usage by running t…
kohya-ss Sep 20, 2025
f41e9e2
feat: add vae_chunk_size argument for memory-efficient VAE decoding a…
kohya-ss Sep 21, 2025
e7b8e9a
doc: add --vae_chunk_size option for training and inference
kohya-ss Sep 21, 2025
9621d9d
feat: add Adaptive Projected Guidance parameters and noise rescaling
kohya-ss Sep 21, 2025
040d976
feat: add guidance rescale options for Adaptive Projected Guidance in…
kohya-ss Sep 21, 2025
3876343
fix: remove print statement for guidance rescale in AdaptiveProjected…
kohya-ss Sep 21, 2025
806d535
fix: block-wise scaling is overwritten by per-tensor scaling
kohya-ss Sep 21, 2025
e7b8982
Update library/custom_offloading_utils.py
kohya-ss Sep 21, 2025
753c794
Update hunyuan_image_train_network.py
kohya-ss Sep 21, 2025
31f7df3
doc: add --network_train_unet_only option for HunyuanImage-2.1 training
kohya-ss Sep 23, 2025
58df9df
doc: update README with HunyuanImage-2.1 LoRA training details and re…
kohya-ss Sep 23, 2025
121853c
Merge pull request #2198 from kohya-ss/feat-hunyuan-image-2.1-inference
kohya-ss Sep 23, 2025
4b79d73
fix: update metadata construction to include model_config for flux
kohya-ss Sep 24, 2025
4c197a5
Merge pull request #2207 from kohya-ss/fix-flux-extract-lora-metadata…
kohya-ss Sep 24, 2025
6a826d2
feat: add new parameters for sample image inference configuration
kohya-ss Sep 28, 2025
67d0621
Merge pull request #2212 from kohya-ss/fix-hunyuan-image-sample-gener…
kohya-ss Sep 28, 2025
a0c26a0
docs: enhance text encoder CPU usage instructions for HunyuanImage-2.…
kohya-ss Sep 28, 2025
f0c767e
Merge pull request #2213 from kohya-ss/doc-hunyuan-image-training-tex…
kohya-ss Sep 28, 2025
5462a6b
Merge branch 'dev' into sd3
kohya-ss Sep 29, 2025
5e366ac
Merge pull request #2003 from laolongboy/sd3-dev
kohya-ss Oct 1, 2025
a33cad7
fix: error on batch generation closes #2209
kohya-ss Oct 15, 2025
a5a1620
Merge pull request #2226 from kohya-ss/fix-hunyuan-image-batch-gen-error
kohya-ss Oct 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,134 @@
This repository contains training, generation and utility scripts for Stable Diffusion.

## FLUX.1 LoRA training (WIP)

This feature is experimental. The options and the training script may change in the future. Please let us know if you have any idea to improve the training.

__Please update PyTorch to 2.4.0. We have tested with `torch==2.4.0` and `torchvision==0.19.0` with CUDA 12.4. We also updated `accelerate` to 0.33.0 just to be safe. `requirements.txt` is also updated, so please update the requirements.__

The command to install PyTorch is as follows:
`pip3 install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124`

Aug 16, 2024:

FLUX.1 schnell model based training is now supported (but not tested). If the name of the model file contains `schnell`, the model is treated as a schnell model.

Added `--t5xxl_max_token_length` option to specify the maximum token length of T5XXL. The default is 512 in dev and 256 in schnell.

Previously, when `--max_token_length` was specified, that value was used, and 512 was used when omitted (default). Therefore, there is no impact if `--max_token_length` was not specified. If `--max_token_length` was specified, please specify `--t5xxl_max_token_length` instead. `--max_token_length` is ignored during FLUX.1 training.

Aug 14, 2024: Sample image generation during training is now supported. Specify options such as `--sample_prompts` and `--sample_every_n_epochs`. It will be very slow when `--split_mode` is specified.

Aug 13, 2024:

__Experimental__ A network argument `train_blocks` is added to `lora_flux`. This is to select the target blocks of LoRA from FLUX double blocks and single blocks. Specify like `--network_args "train_blocks=single"`. `all` trains both double blocks and single blocks, `double` trains only double blocks, and `single` trains only single blocks. The default (omission) is `all`.

This argument is available even if `--split_mode` is not specified.

__Experimental__ `--split_mode` option is added to `flux_train_network.py`. This splits FLUX into double blocks and single blocks for training. By enabling gradients only for the single blocks part, memory usage is reduced. When this option is specified, you need to specify `"train_blocks=single"` in the network arguments.

This option enables training with 12GB VRAM GPUs, but the training speed is 2-3 times slower than the default.

Aug 11, 2024: Fix `--apply_t5_attn_mask` option to work. Please remove and re-generate the latents cache file if you have used the option before.

Aug 10, 2024: LoRA key prefix is changed to `lora_unet` from `lora_flex` to make it compatible with ComfyUI.

We have added a new training script for LoRA training. The script is `flux_train_network.py`. See `--help` for options. Sample command is below, settings are based on [AI Toolkit by Ostris](https://github.com/ostris/ai-toolkit). It will work with 24GB VRAM GPUs.

```
accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 flux_train_network.py --pretrained_model_name_or_path flux1-dev.sft --clip_l sd3/clip_l.safetensors --t5xxl sd3/t5xxl_fp16.safetensors --ae ae.sft --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 4 --optimizer_type adamw8bit --learning_rate 1e-4 --network_train_unet_only --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 4 --save_every_n_epochs 1 --dataset_config dataset_1024_bs2.toml --output_dir path/to/output/dir --output_name flux-lora-name --timestep_sampling sigmoid --model_prediction_type raw --guidance_scale 1.0 --loss_type l2
```

The training can be done with 16GB VRAM GPUs with Adafactor optimizer. Please use settings like below:

```
--optimizer_type adafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False"
```

The training can be done with 12GB VRAM GPUs with Adafactor optimizer, `--split_mode` and `train_blocks=single` options. Please use settings like below:

```
--optimizer_type adafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" --split_mode --network_args "train_blocks=single"
```

LoRAs for Text Encoders are not tested yet.

We have added some new options (Aug 10, 2024): `--time_sampling`, `--sigmoid_scale`, `--model_prediction_type` and `--discrete_flow_shift`. The options are as follows:

- `--timestep_sampling` is the method to sample timesteps (0-1): `sigma` (sigma-based, same as SD3), `uniform` (uniform random), or `sigmoid` (sigmoid of random normal, same as x-flux).
- `--sigmoid_scale` is the scale factor for sigmoid timestep sampling (only used when timestep-sampling is "sigmoid"). The default is 1.0. Larger values will make the sampling more uniform.
- `--model_prediction_type` is how to interpret and process the model prediction: `raw` (use as is, same as x-flux), `additive` (add to noisy input), `sigma_scaled` (apply sigma scaling, same as SD3).
- `--discrete_flow_shift` is the discrete flow shift for the Euler Discrete Scheduler, default is 3.0 (same as SD3).

`--loss_type` may be useful for FLUX.1 training. The default is `l2`.

In our experiments, `--timestep_sampling sigma --model_prediction_type raw --discrete_flow_shift 1.0` with `--loss_type l2` seems to work better than the default (SD3) settings. The multiplier of LoRA should be adjusted.

additional note (Aug 11): A quick check shows that the settings in [AI Toolkit by Ostris](https://github.com/ostris/ai-toolkit) seems to be equivalent to `--timestep_sampling sigmoid --model_prediction_type raw --guidance_scale 1.0` (with the default `l2` loss_type). This seems to be a good starting point. Thanks to Ostris for the great work!

Other settings may work better, so please try different settings.

We also not sure how many epochs are needed for convergence, and how the learning rate should be adjusted.

The trained LoRA model can be used with ComfyUI.

The inference script is also available. The script is `flux_minimal_inference.py`. See `--help` for options.

Aug 12: `--interactive` option is now working.

```
python flux_minimal_inference.py --ckpt flux1-dev.sft --clip_l sd3/clip_l.safetensors --t5xxl sd3/t5xxl_fp16.safetensors --ae ae.sft --dtype bf16 --prompt "a cat holding a sign that says hello world" --out path/to/output/dir --seed 1 --flux_dtype fp8 --offload --lora lora-flux-name.safetensors;1.0
```

## SD3 training

SD3 training is done with `sd3_train.py`.

__Jul 27, 2024__:
- Latents and text encoder outputs caching mechanism is refactored significantly.
- Existing cache files for SD3 need to be recreated. Please delete the previous cache files.
- With this change, dataset initialization is significantly faster, especially for large datasets.

- Architecture-dependent parts are extracted from the dataset (`train_util.py`). This is expected to make it easier to add future architectures.

- Architecture-dependent parts including the cache mechanism for SD1/2/SDXL are also extracted. The basic operation of SD1/2/SDXL training on the sd3 branch has been confirmed, but there may be bugs. Please use the main or dev branch for SD1/2/SDXL training.

---

`fp16` and `bf16` are available for mixed precision training. We are not sure which is better.

`optimizer_type = "adafactor"` is recommended for 24GB VRAM GPUs. `cache_text_encoder_outputs_to_disk` and `cache_latents_to_disk` are necessary currently.

`clip_l`, `clip_g` and `t5xxl` can be specified if the checkpoint does not include them.

t5xxl works with `fp16` now.

There are `t5xxl_device` and `t5xxl_dtype` options for `t5xxl` device and dtype.

`text_encoder_batch_size` is added experimentally for caching faster.

```toml
learning_rate = 1e-6 # seems to depend on the batch size
optimizer_type = "adafactor"
optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False" ]
cache_text_encoder_outputs = true
cache_text_encoder_outputs_to_disk = true
vae_batch_size = 1
text_encoder_batch_size = 4
cache_latents = true
cache_latents_to_disk = true
```

__2024/7/27:__

Latents およびテキストエンコーダ出力のキャッシュの仕組みを大きくリファクタリングしました。SD3 用の既存のキャッシュファイルの再作成が必要になりますが、ご了承ください(以前のキャッシュファイルは削除してください)。これにより、特にデータセットの規模が大きい場合のデータセット初期化が大幅に高速化されます。

データセット (`train_util.py`) からアーキテクチャ依存の部分を切り出しました。これにより将来的なアーキテクチャ追加が容易になると期待しています。

SD1/2/SDXL のキャッシュ機構を含むアーキテクチャ依存の部分も切り出しました。sd3 ブランチの SD1/2/SDXL 学習について、基本的な動作は確認していますが、不具合があるかもしれません。SD1/2/SDXL の学習には main または dev ブランチをお使いください。

---

[__Change History__](#change-history) is moved to the bottom of the page.
更新履歴は[ページ末尾](#change-history)に移しました。

Expand Down
54 changes: 39 additions & 15 deletions fine_tune.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from tqdm import tqdm

import torch
from library import deepspeed_utils
from library import deepspeed_utils, strategy_base
from library.device_utils import init_ipex, clean_memory_on_device

init_ipex()
Expand Down Expand Up @@ -39,6 +39,7 @@
scale_v_prediction_loss_like_noise_prediction,
apply_debiased_estimation,
)
import library.strategy_sd as strategy_sd


def train(args):
Expand All @@ -52,7 +53,15 @@ def train(args):
if args.seed is not None:
set_seed(args.seed) # 乱数系列を初期化する

tokenizer = train_util.load_tokenizer(args)
tokenize_strategy = strategy_sd.SdTokenizeStrategy(args.v2, args.max_token_length, args.tokenizer_cache_dir)
strategy_base.TokenizeStrategy.set_strategy(tokenize_strategy)

# prepare caching strategy: this must be set before preparing dataset. because dataset may use this strategy for initialization.
if cache_latents:
latents_caching_strategy = strategy_sd.SdSdxlLatentsCachingStrategy(
False, args.cache_latents_to_disk, args.vae_batch_size, False
)
strategy_base.LatentsCachingStrategy.set_strategy(latents_caching_strategy)

# データセットを準備する
if args.dataset_class is None:
Expand Down Expand Up @@ -81,10 +90,10 @@ def train(args):
]
}

blueprint = blueprint_generator.generate(user_config, args, tokenizer=tokenizer)
blueprint = blueprint_generator.generate(user_config, args)
train_dataset_group = config_util.generate_dataset_group_by_blueprint(blueprint.dataset_group)
else:
train_dataset_group = train_util.load_arbitrary_dataset(args, tokenizer)
train_dataset_group = train_util.load_arbitrary_dataset(args)

current_epoch = Value("i", 0)
current_step = Value("i", 0)
Expand Down Expand Up @@ -165,8 +174,9 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
vae.to(accelerator.device, dtype=vae_dtype)
vae.requires_grad_(False)
vae.eval()
with torch.no_grad():
train_dataset_group.cache_latents(vae, args.vae_batch_size, args.cache_latents_to_disk, accelerator.is_main_process)

train_dataset_group.new_cache_latents(vae, accelerator.is_main_process)

vae.to("cpu")
clean_memory_on_device(accelerator.device)

Expand All @@ -192,6 +202,9 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
else:
text_encoder.eval()

text_encoding_strategy = strategy_sd.SdTextEncodingStrategy(args.clip_skip)
strategy_base.TextEncodingStrategy.set_strategy(text_encoding_strategy)

if not cache_latents:
vae.requires_grad_(False)
vae.eval()
Expand All @@ -214,7 +227,11 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
accelerator.print("prepare optimizer, data loader etc.")
_, _, optimizer = train_util.get_optimizer(args, trainable_params=trainable_params)

# dataloaderを準備する
# prepare dataloader
# strategies are set here because they cannot be referenced in another process. Copy them with the dataset
# some strategies can be None
train_dataset_group.set_current_strategies()

# DataLoaderのプロセス数:0 は persistent_workers が使えないので注意
n_workers = min(args.max_data_loader_n_workers, os.cpu_count()) # cpu_count or max_data_loader_n_workers
train_dataloader = torch.utils.data.DataLoader(
Expand Down Expand Up @@ -317,7 +334,9 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
)

# For --sample_at_first
train_util.sample_images(accelerator, args, 0, global_step, accelerator.device, vae, tokenizer, text_encoder, unet)
train_util.sample_images(
accelerator, args, 0, global_step, accelerator.device, vae, tokenize_strategy.tokenizer, text_encoder, unet
)

loss_recorder = train_util.LossRecorder()
for epoch in range(num_train_epochs):
Expand All @@ -342,19 +361,22 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
with torch.set_grad_enabled(args.train_text_encoder):
# Get the text embedding for conditioning
if args.weighted_captions:
# TODO move to strategy_sd.py
encoder_hidden_states = get_weighted_text_embeddings(
tokenizer,
tokenize_strategy.tokenizer,
text_encoder,
batch["captions"],
accelerator.device,
args.max_token_length // 75 if args.max_token_length else 1,
clip_skip=args.clip_skip,
)
else:
input_ids = batch["input_ids"].to(accelerator.device)
encoder_hidden_states = train_util.get_hidden_states(
args, input_ids, tokenizer, text_encoder, None if not args.full_fp16 else weight_dtype
)
input_ids = batch["input_ids_list"][0].to(accelerator.device)
encoder_hidden_states = text_encoding_strategy.encode_tokens(
tokenize_strategy, [text_encoder], [input_ids]
)[0]
if args.full_fp16:
encoder_hidden_states = encoder_hidden_states.to(weight_dtype)

# Sample noise, sample a random timestep for each image, and add noise to the latents,
# with noise offset and/or multires noise if specified
Expand Down Expand Up @@ -409,7 +431,7 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
global_step += 1

train_util.sample_images(
accelerator, args, None, global_step, accelerator.device, vae, tokenizer, text_encoder, unet
accelerator, args, None, global_step, accelerator.device, vae, tokenize_strategy.tokenizer, text_encoder, unet
)

# 指定ステップごとにモデルを保存
Expand Down Expand Up @@ -472,7 +494,9 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
vae,
)

train_util.sample_images(accelerator, args, epoch + 1, global_step, accelerator.device, vae, tokenizer, text_encoder, unet)
train_util.sample_images(
accelerator, args, epoch + 1, global_step, accelerator.device, vae, tokenize_strategy.tokenizer, text_encoder, unet
)

is_main_process = accelerator.is_main_process
if is_main_process:
Expand Down
Loading