Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
768 commits
Select commit Hold shift + click to select a range
936d333
Merge pull request #1985 from gesen2egee/pytorch-optimizer
kohya-ss Mar 20, 2025
d151833
docs: update README with recent changes and specify version for pytor…
kohya-ss Mar 20, 2025
16cef81
Refactor sigmas and timesteps
rockerBOO Mar 20, 2025
e8b3254
Add flux_train_utils tests for get get_noisy_model_input_and_timesteps
rockerBOO Mar 20, 2025
8aa1265
Scale sigmoid to default 1.0
rockerBOO Mar 20, 2025
d40f5b1
Revert "Scale sigmoid to default 1.0"
rockerBOO Mar 20, 2025
89f0d27
Set sigmoid_scale to default 1.0
rockerBOO Mar 20, 2025
6364379
Merge branch 'dev' into sd3
kohya-ss Mar 21, 2025
2ba1cc7
Fix max norms not applying to noise
rockerBOO Mar 22, 2025
61f7283
Fix non-cache vae encode
rockerBOO Mar 22, 2025
1481217
Merge pull request #25 from rockerBOO/lumina-fix-non-cache-image-vae-…
sdbds Mar 22, 2025
3000816
Merge pull request #24 from rockerBOO/lumina-fix-max-norms
sdbds Mar 22, 2025
8ebe858
Merge branch 'dev' into sd3
kohya-ss Mar 24, 2025
e64dc05
Supplement the input parameters to correctly convert the flux model t…
laolongboy Mar 24, 2025
182544d
Remove pertubation seed
rockerBOO Mar 26, 2025
0181b7a
Remove progress bar avg norms
rockerBOO Mar 27, 2025
93a4efa
Merge branch 'sd3' into resize-interpolation
kohya-ss Mar 30, 2025
9e9a13a
Merge pull request #1936 from rockerBOO/resize-interpolation
kohya-ss Mar 30, 2025
1f432e2
use PIL for lanczos and box
kohya-ss Mar 30, 2025
96a133c
README.md: update recent updates section to include new interpolation…
kohya-ss Mar 30, 2025
3149b27
Merge pull request #2018 from kohya-ss/resize-interpolation-small-fix
kohya-ss Mar 30, 2025
59d98e4
Merge pull request #1974 from rockerBOO/lora-ggpo
kohya-ss Mar 30, 2025
d0b5c0e
chore: formatting, add TODO comment
kohya-ss Mar 30, 2025
aaa26bb
docs: update README to include LoRA-GGPO details for FLUX.1 training
kohya-ss Mar 30, 2025
b3c56b2
Merge branch 'dev' into sd3
kohya-ss Mar 31, 2025
ede3470
Ensure all size parameters are integers to prevent type errors
LexSong Apr 1, 2025
b822b7e
Fix the interpolation logic error in resize_image()
LexSong Apr 1, 2025
f1423a7
fix: add resize_interpolation parameter to FineTuningDataset constructor
kohya-ss Apr 3, 2025
92845e8
Merge pull request #2026 from kohya-ss/fix-finetune-dataset-resize-in…
kohya-ss Apr 3, 2025
fd36fd1
Fix resize PR link
rockerBOO Apr 3, 2025
606e687
Merge pull request #2022 from LexSong/fix-resize-issue
kohya-ss Apr 5, 2025
ee0f754
Merge pull request #2028 from rockerBOO/patch-5
kohya-ss Apr 5, 2025
00e12ee
update for lost change
sdbds Apr 6, 2025
1a4f1ff
Merge branch 'lumina' of https://github.com/sdbds/sd-scripts into lumina
sdbds Apr 6, 2025
9f1892c
Merge branch 'sd3' into lumina
sdbds Apr 6, 2025
7f93e21
fix typo
sdbds Apr 6, 2025
c56dc90
Merge pull request #1992 from rockerBOO/flux-ip-noise-gamma
kohya-ss Apr 6, 2025
4589262
README.md: Update recent updates section to include IP noise gamma fe…
kohya-ss Apr 6, 2025
5a18a03
Merge branch 'dev' into sd3
kohya-ss Apr 7, 2025
8f5a2eb
Add documentation for LoRA training scripts for SD1/2, SDXL, FLUX.1 a…
kohya-ss Apr 10, 2025
ceb19be
update docs. sdxl is transltaed, flux.1 is corrected
kohya-ss Apr 13, 2025
b1bbd45
doc: update sd3 LoRA, sdxl LoRA advanced
kohya-ss Apr 14, 2025
176baa6
doc: update sd3 and sdxl training guides
kohya-ss Apr 16, 2025
06df037
Merge branch 'sd3' into flux-sample-cfg
kohya-ss Apr 16, 2025
629073c
Add guidance scale for prompt param and flux sampling
kohya-ss Apr 16, 2025
7c61c0d
Add autocast warpper for forward functions in deepspeed_utils.py to t…
sharlynxy Apr 22, 2025
d33d5ec
#
sharlynxy Apr 22, 2025
7f984f4
#
sharlynxy Apr 22, 2025
c8af252
refactor
Apr 22, 2025
f501209
Merge branch 'dev/xy/align_dtype_using_mixed_precision' of github.com…
Apr 22, 2025
0d9da0e
Merge pull request #1 from saibit-tech/dev/xy/align_dtype_using_mixed…
sharlynxy Apr 22, 2025
b11c053
Merge branch 'dev' into sd3
kohya-ss Apr 22, 2025
899f345
update for init problem
sdbds Apr 23, 2025
4fc9178
fix bugs
sdbds Apr 23, 2025
adb775c
Update: requirement diffusers[torch]==0.25.0
sharlynxy Apr 23, 2025
abf2c44
Dynamically set device in deepspeed wrapper (#2)
sharlynxy Apr 23, 2025
46ad3be
update deepspeed wrapper
sharlynxy Apr 24, 2025
5c50cdb
Merge branch 'sd3' into flux-sample-cfg
kohya-ss Apr 27, 2025
8387e0b
docs: update README to include CFG scale support in FLUX.1 training
kohya-ss Apr 27, 2025
309c44b
Merge pull request #2064 from kohya-ss/flux-sample-cfg
kohya-ss Apr 27, 2025
0e8ac43
Merge branch 'dev' into sd3
kohya-ss Apr 27, 2025
13296ae
Merge branch 'sd3' of https://github.com/kohya-ss/sd-scripts into sd3
kohya-ss Apr 27, 2025
fd3a445
fix: revert default emb guidance scale and CFG scale for FLUX.1 sampling
kohya-ss Apr 27, 2025
29523c9
docs: add note for user feedback on CFG scale in FLUX.1 training
kohya-ss Apr 27, 2025
80320d2
Merge pull request #2066 from kohya-ss/quick-fix-flux-sampling-scales
kohya-ss Apr 27, 2025
64430eb
Merge branch 'dev' into sd3
kohya-ss Apr 29, 2025
1684aba
remove deepspeed from requirements.txt
sharlynxy Apr 30, 2025
a4fae93
Add pythonpath to pytest.ini
rockerBOO May 1, 2025
f62c68d
Make grad_norm and combined_grad_norm None is not recording
rockerBOO May 1, 2025
b4a89c3
Fix None
rockerBOO May 1, 2025
7c075a9
Merge pull request #2060 from saibit-tech/sd3
kohya-ss May 1, 2025
865c8d5
README.md: Update recent updates and add DeepSpeed installation instr…
kohya-ss May 1, 2025
a27ace7
doc: add DeepSpeed installation in header section
kohya-ss May 1, 2025
e858132
Merge pull request #2074 from kohya-ss/deepspeed-readme
kohya-ss May 1, 2025
e2ed265
Merge pull request #2072 from rockerBOO/pytest-pythonpath
kohya-ss May 1, 2025
f344df0
Merge branch 'sd3' into update-docs
kohya-ss May 2, 2025
5b38d07
Merge pull request #2073 from rockerBOO/fix-mean-grad-norms
kohya-ss May 11, 2025
2982197
Merge branch 'sd3' into update-docs
kohya-ss May 17, 2025
19a180f
Add English versions with Japanese in details
kohya-ss May 17, 2025
c5fb5ec
Merge pull request #2086 from kohya-ss/codex/translate-and-structure-…
kohya-ss May 17, 2025
08aed00
doc: update FLUX.1 for newer features from README.md
kohya-ss May 17, 2025
e7e371c
doc: update English translation for advanced SDXL LoRA training
kohya-ss May 17, 2025
2bfda12
Update workflows to read-all instead of write-all
rockerBOO May 20, 2025
5753b8f
Merge pull request #2088 from rockerBOO/checkov-update
kohya-ss May 20, 2025
a376fec
doc: add comprehensive README for image generation script with usage …
kohya-ss May 24, 2025
e4d6923
Add tests for syntax checking training scripts
rockerBOO Jun 3, 2025
61eda76
Merge pull request #2108 from rockerBOO/syntax-test
kohya-ss Jun 4, 2025
bb47f1e
Fix unwrap_model handling for None text_encoders in sample_images fun…
kohya-ss Jun 8, 2025
0145efc
Merge branch 'sd3' into lumina
rockerBOO Jun 9, 2025
d94bed6
Add lumina tests and fix image masks
rockerBOO Jun 10, 2025
77dbabe
Merge pull request #26 from rockerBOO/lumina-test-fix-mask
sdbds Jun 10, 2025
fc40a27
Merge branch 'dev' into sd3
kohya-ss Jun 15, 2025
3e6935a
Merge pull request #2115 from kohya-ss/fix-flux-sampling-accelerate-e…
kohya-ss Jun 15, 2025
1db7855
Merge branch 'sd3' into update-sd3
rockerBOO Jun 16, 2025
0e929f9
Revert system_prompt for dataset config
rockerBOO Jun 16, 2025
8e4dc1f
Merge pull request #28 from rockerBOO/lumina-train_util
sdbds Jun 17, 2025
52d1337
Merge pull request #1927 from sdbds/lumina
kohya-ss Jun 29, 2025
935e003
feat: update lumina system prompt handling
kohya-ss Jun 29, 2025
884c1f3
fix: update to work with cache text encoder outputs (without disk)
kohya-ss Jun 29, 2025
5034c6f
feat: add workaround for 'gated repo' error on github actions
kohya-ss Jun 29, 2025
078ee28
feat: add more workaround for 'gated repo' error on github actions
kohya-ss Jun 29, 2025
6731d8a
fix: update system prompt handling
kohya-ss Jun 29, 2025
05f392f
feat: add minimum inference code for Lumina with image generation cap…
kohya-ss Jul 3, 2025
a87e999
Change to 3
rockerBOO Jul 7, 2025
2fffcb6
Merge pull request #2146 from rockerBOO/lumina-typo
kohya-ss Jul 8, 2025
b4d1152
fix: sample generation with system prompt, without TE output caching
kohya-ss Jul 9, 2025
7fb0d30
feat: add LoRA support for lumina minimal inference
kohya-ss Jul 9, 2025
3f9eab4
fix: update default values in lumina minimal inference as same as sam…
kohya-ss Jul 9, 2025
7bd9a6b
Add prompt guidance files for Claude and Gemini, and update README fo…
kohya-ss Jul 10, 2025
9a50c96
Merge pull request #2147 from kohya-ss/ai-coding-agent-prompt
kohya-ss Jul 10, 2025
0b90555
feat: add .claude and .gemini to .gitignore
kohya-ss Jul 10, 2025
2e0fcc5
Merge pull request #2148 from kohya-ss/update-gitignore-for-claude-an…
kohya-ss Jul 10, 2025
4e7dfc0
Merge branch 'sd3' into feature-lumina-image
kohya-ss Jul 10, 2025
2a53524
Merge branch 'sd3' into update-docs
kohya-ss Jul 10, 2025
d0b335d
feat: add LoRA training guide for Lumina Image 2.0 (WIP)
kohya-ss Jul 10, 2025
8a72f56
fix: clarify Flash Attention usage in lumina training guide
kohya-ss Jul 11, 2025
1a9bf2a
feat: add interactive mode for generating multiple images
kohya-ss Jul 13, 2025
88dc321
fix: support LoRA w/o TE for create_network_from_weights
kohya-ss Jul 13, 2025
88960e6
doc: update lumina LoRA training guide
kohya-ss Jul 13, 2025
999df5e
fix: update default values for timestep_sampling and model_prediction…
kohya-ss Jul 13, 2025
30295c9
fix: update parameter names for CFG truncate and Renorm CFG in docume…
kohya-ss Jul 13, 2025
13ccfc3
fix: update flow matching loss and variable names
kohya-ss Jul 13, 2025
a96d684
feat: add Chroma model implementation
kohya-ss Jul 15, 2025
e0fcb51
feat: support Neta Lumina all-in-one weights
kohya-ss Jul 15, 2025
25771a5
fix: update help text for cfg_trunc_ratio argument
kohya-ss Jul 15, 2025
c0c36a4
fix: remove duplicated latent normalization in decoding
kohya-ss Jul 15, 2025
a7b33f3
Fix alphas cumprod after add_noise for DDIMScheduler
rockerBOO Jul 16, 2025
3adbbb6
Add note about why we are moving it
rockerBOO Jul 16, 2025
d53a532
Merge pull request #2153 from rockerBOO/fix-alphas-cumprod
kohya-ss Jul 17, 2025
24d2ea8
feat: support Chroma model in loading and inference processes
kohya-ss Jul 20, 2025
404ddb0
fix: inference for Chroma model
kohya-ss Jul 20, 2025
8fd0b12
feat: update DoubleStreamBlock and SingleStreamBlock to handle text s…
kohya-ss Jul 20, 2025
c4958b5
feat: change img/txt order for attention and single blocks
kohya-ss Jul 20, 2025
b4e8626
feat: add LoRA training support for Chroma
kohya-ss Jul 20, 2025
0b763ef
feat: fix timestep for input_vec for Chroma
kohya-ss Jul 20, 2025
77a160d
fix: skip LoRA creation for None text encoders (CLIP-L for Chroma)
kohya-ss Jul 20, 2025
aec7e16
feat: add an option to add system prompt for negative in lumina infer…
kohya-ss Jul 21, 2025
d300f19
docs: update Lumina training guide to include inference script and op…
kohya-ss Jul 21, 2025
518545b
docs: add support information for Lumina-Image 2.0 in recent updates
kohya-ss Jul 21, 2025
d98400b
Merge pull request #2138 from kohya-ss/feature-lumina-image
kohya-ss Jul 21, 2025
9eda938
Merge branch 'sd3' into feature-chroma-support
kohya-ss Jul 21, 2025
7de68c1
Merge branch 'sd3' into update-docs
kohya-ss Jul 21, 2025
c84a163
docs: update README for documentation
kohya-ss Jul 21, 2025
4987057
Merge pull request #2042 from kohya-ss/update-docs
kohya-ss Jul 21, 2025
eef0550
Merge branch 'sd3' into feature-chroma-support
kohya-ss Jul 21, 2025
32f0601
doc: update flux train document and add about breaking changes in sam…
kohya-ss Jul 21, 2025
c28e7a4
feat: add regex-based rank and learning rate configuration for FLUX.1…
kohya-ss Jul 26, 2025
af14eab
doc: update section number for regex-based rank and learning rate con…
kohya-ss Jul 26, 2025
6c8973c
doc: add reference link for input vector gradient requirement in Chro…
kohya-ss Jul 28, 2025
10de781
build(deps): pytorch-optimizer to 3.7.0
kozistr Jul 28, 2025
450630c
fix: create network from weights not working
kohya-ss Jul 29, 2025
96feb61
feat: implement modulation vector extraction for Chroma and update re…
kohya-ss Jul 30, 2025
250f0eb
doc: update README and training guide with breaking changes for CFG s…
kohya-ss Jul 30, 2025
5dff02a
Merge pull request #2157 from kohya-ss/feature-chroma-support
kohya-ss Jul 30, 2025
bd6418a
fix: add assertion for apply_t5_attn_mask requirement in Chroma
kohya-ss Aug 1, 2025
aebfea2
Merge pull request #2165 from kohya-ss/force_t5_attn_mask_for_chroma_…
kohya-ss Aug 1, 2025
75dd8c8
Merge pull request #2160 from kozistr/update/pytorch-optimizer
kohya-ss Aug 1, 2025
5249732
chore: update README to include `--apply_t5_attn_mask` requirement fo…
kohya-ss Aug 1, 2025
b9c091e
Fix validation documentation
rockerBOO Aug 2, 2025
24c605e
Update flux_train_network.md
rockerBOO Aug 2, 2025
0ad2cb8
Update flux_train_network.md
rockerBOO Aug 2, 2025
d24d733
Update model spec to 1.0.1. Refactor model spec
rockerBOO Aug 3, 2025
056472c
Add tests
rockerBOO Aug 3, 2025
bf0f86e
Add sai_model_spec to train_network.py
rockerBOO Aug 3, 2025
10bfcb9
Remove text model spec
rockerBOO Aug 3, 2025
9bb50c2
Set sai_model_spec to must
rockerBOO Aug 3, 2025
c149cf2
Add parser args for other trainers.
rockerBOO Aug 3, 2025
a125c10
Merge pull request #2167 from rockerBOO/patch-6
kohya-ss Aug 12, 2025
dcce057
Merge pull request #2168 from rockerBOO/model-spec-1.0.1
kohya-ss Aug 13, 2025
351bed9
fix model type handling in analyze_state_dict_state function for SD3
kohya-ss Aug 13, 2025
f25c265
Merge pull request #2174 from kohya-ss/fix-modelspec-sd3-finetune
kohya-ss Aug 13, 2025
18e6251
Merge branch 'dev' into sd3
kohya-ss Aug 15, 2025
6edbe00
feat: update libraries, remove warnings
kohya-ss Aug 16, 2025
6f24bce
fix: remove unnecessary super call in assert_extra_args method
kohya-ss Aug 16, 2025
f61c442
fix: use strategy for tokenizer and latent caching
kohya-ss Aug 16, 2025
acba279
fix: update PyTorch version in workflow matrix
kohya-ss Aug 24, 2025
4b12746
Merge branch 'dev' into sd3
kohya-ss Aug 24, 2025
f7acd2f
fix: consolidate PyTorch versions in workflow matrix
kohya-ss Aug 24, 2025
ac72cf8
feat: remove bitsandbytes version specification in requirements.txt
kohya-ss Aug 27, 2025
c52c45c
doc: update for PyTorch and libraries versions
kohya-ss Aug 27, 2025
5a5138d
doc: add PR reference for PyTorch and library versions update
kohya-ss Aug 27, 2025
8cadec6
Merge pull request #2178 from kohya-ss/update-libraries
kohya-ss Aug 27, 2025
e836b7f
fix: chroma LoRA training without Text Encode caching
kohya-ss Aug 30, 2025
884e07d
Merge pull request #2191 from kohya-ss/fix-chroma-training-withtout-t…
kohya-ss Aug 30, 2025
989448a
doc: enhance SD3/SDXL LoRA training guide
kohya-ss Aug 31, 2025
fe81d40
doc: refactor structure for improved readability and maintainability
kohya-ss Aug 31, 2025
8071013
doc: add Sage Attention and sample batch size options to Lumina train…
kohya-ss Aug 31, 2025
c38b07d
doc: add validation loss documentation for model training
kohya-ss Aug 31, 2025
142d0be
doc: add comprehensive fine-tuning guide for various model architectures
kohya-ss Sep 1, 2025
9984868
doc: update README to include support for SDXL models and additional …
kohya-ss Sep 1, 2025
6c82327
doc: remove Japanese section on Gradual Latent options from gen_img R…
kohya-ss Sep 1, 2025
ddfb38e
doc: add documentation for Textual Inversion training scripts
kohya-ss Sep 4, 2025
884fc8c
doc: remove SD3/FLUX.1 training guide
kohya-ss Sep 4, 2025
952f9ce
Update docs/train_textual_inversion.md
kohya-ss Sep 4, 2025
0bb0d91
doc: update introduction and clarify command line option priorities i…
kohya-ss Sep 6, 2025
ef43979
Fix validation dataset documentation to not use subsets
rockerBOO Sep 8, 2025
78685b9
Move general settings to top to make more clear the validation bits
rockerBOO Sep 8, 2025
fe4c189
blocks_to_swap is supported for validation loss now
rockerBOO Sep 8, 2025
f833772
Merge pull request #2196 from rockerBOO/validation-dataset-subset
kohya-ss Sep 9, 2025
ee8e670
Merge branch 'sd3' into doc-update-for-latest-features
kohya-ss Sep 9, 2025
5149be5
feat: initial commit for HunyuanImage-2.1 inference
kohya-ss Sep 11, 2025
7f983c5
feat: block swap for inference and initial impl for HunyuanImage LoRA…
kohya-ss Sep 11, 2025
a0f0afb
fix: revert constructor signature update
kohya-ss Sep 11, 2025
cbc9e1a
feat: add byt5 to the list of recognized words in typos configuration
kohya-ss Sep 11, 2025
419a9c4
Merge pull request #2192 from kohya-ss/doc-update-for-latest-features
kohya-ss Sep 12, 2025
209c02d
feat: HunyuanImage LoRA training
kohya-ss Sep 12, 2025
aa0af24
Merge branch 'sd3' into feat-hunyuan-image-2.1-inference
kohya-ss Sep 12, 2025
7a651ef
feat: add 'tak' to recognized words and update block swap method to s…
kohya-ss Sep 12, 2025
9a61d61
feat: avoid unet type casting when fp8_scaled
kohya-ss Sep 12, 2025
8783f8a
feat: faster safetensors load and split safetensor utils
kohya-ss Sep 13, 2025
e1c666e
Update library/safetensors_utils.py
kohya-ss Sep 13, 2025
4568631
docs: update README to reflect improved loading speed of .safetensors…
kohya-ss Sep 13, 2025
f5d44fd
Merge pull request #2200 from kohya-ss/feat-faster-safetensors-load
kohya-ss Sep 13, 2025
bae7fa7
Merge branch 'sd3' into feat-hunyuan-image-2.1-inference
kohya-ss Sep 13, 2025
d831c88
fix: sample generation doesn't work with block swap
kohya-ss Sep 13, 2025
4e2a80a
refactor: update imports to use safetensors_utils for memory-efficien…
kohya-ss Sep 13, 2025
29b0500
fix: restore files section in _typos.toml for exclusion configuration
kohya-ss Sep 13, 2025
e04b9f0
docs: add LoRA training guide for HunyuanImage-2.1 model (by Gemini CLI)
kohya-ss Sep 13, 2025
1a73b5e
feat: add script to convert LoRA format to ComfyUI format
kohya-ss Sep 14, 2025
2732be0
Merge branch 'feat-hunyuan-image-2.1-inference' of https://github.com…
kohya-ss Sep 14, 2025
39458ec
fix: update default values for guidance_scale, image_size, infer_step…
kohya-ss Sep 16, 2025
f318dda
docs: update HunyuanImage-2.1 training guide with model download inst…
kohya-ss Sep 16, 2025
cbe2a9d
feat: add conversion script for LoRA models to ComfyUI format with re…
kohya-ss Sep 16, 2025
f5b0040
fix: correct tensor indexing in HunyuanVAE2D class for blending and e…
kohya-ss Sep 17, 2025
2ce506e
fix: fp8 casting not working
kohya-ss Sep 18, 2025
f6b4bdc
feat: block-wise fp8 quantization
kohya-ss Sep 18, 2025
f834b2e
fix: --fp8_vl to work
kohya-ss Sep 18, 2025
b090d15
feat: add multi backend attention and related update for HI2.1 models…
kohya-ss Sep 20, 2025
8f20c37
feat: add --text_encoder_cpu option to reduce VRAM usage by running t…
kohya-ss Sep 20, 2025
f41e9e2
feat: add vae_chunk_size argument for memory-efficient VAE decoding a…
kohya-ss Sep 21, 2025
e7b8e9a
doc: add --vae_chunk_size option for training and inference
kohya-ss Sep 21, 2025
9621d9d
feat: add Adaptive Projected Guidance parameters and noise rescaling
kohya-ss Sep 21, 2025
040d976
feat: add guidance rescale options for Adaptive Projected Guidance in…
kohya-ss Sep 21, 2025
3876343
fix: remove print statement for guidance rescale in AdaptiveProjected…
kohya-ss Sep 21, 2025
806d535
fix: block-wise scaling is overwritten by per-tensor scaling
kohya-ss Sep 21, 2025
e7b8982
Update library/custom_offloading_utils.py
kohya-ss Sep 21, 2025
753c794
Update hunyuan_image_train_network.py
kohya-ss Sep 21, 2025
31f7df3
doc: add --network_train_unet_only option for HunyuanImage-2.1 training
kohya-ss Sep 23, 2025
58df9df
doc: update README with HunyuanImage-2.1 LoRA training details and re…
kohya-ss Sep 23, 2025
121853c
Merge pull request #2198 from kohya-ss/feat-hunyuan-image-2.1-inference
kohya-ss Sep 23, 2025
4b79d73
fix: update metadata construction to include model_config for flux
kohya-ss Sep 24, 2025
4c197a5
Merge pull request #2207 from kohya-ss/fix-flux-extract-lora-metadata…
kohya-ss Sep 24, 2025
6a826d2
feat: add new parameters for sample image inference configuration
kohya-ss Sep 28, 2025
67d0621
Merge pull request #2212 from kohya-ss/fix-hunyuan-image-sample-gener…
kohya-ss Sep 28, 2025
a0c26a0
docs: enhance text encoder CPU usage instructions for HunyuanImage-2.…
kohya-ss Sep 28, 2025
f0c767e
Merge pull request #2213 from kohya-ss/doc-hunyuan-image-training-tex…
kohya-ss Sep 28, 2025
5462a6b
Merge branch 'dev' into sd3
kohya-ss Sep 29, 2025
5e366ac
Merge pull request #2003 from laolongboy/sd3-dev
kohya-ss Oct 1, 2025
a33cad7
fix: error on batch generation closes #2209
kohya-ss Oct 15, 2025
a5a1620
Merge pull request #2226 from kohya-ss/fix-hunyuan-image-batch-gen-error
kohya-ss Oct 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .ai/claude.prompt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## About This File

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## 1. Project Context
Here is the essential context for our project. Please read and understand it thoroughly.

### Project Overview
@./context/01-overview.md
101 changes: 101 additions & 0 deletions .ai/context/01-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
This file provides the overview and guidance for developers working with the codebase, including setup instructions, architecture details, and common commands.

## Project Architecture

### Core Training Framework
The codebase is built around a **strategy pattern architecture** that supports multiple diffusion model families:

- **`library/strategy_base.py`**: Base classes for tokenization, text encoding, latent caching, and training strategies
- **`library/strategy_*.py`**: Model-specific implementations for SD, SDXL, SD3, FLUX, etc.
- **`library/train_util.py`**: Core training utilities shared across all model types
- **`library/config_util.py`**: Configuration management with TOML support

### Model Support Structure
Each supported model family has a consistent structure:
- **Training script**: `{model}_train.py` (full fine-tuning), `{model}_train_network.py` (LoRA/network training)
- **Model utilities**: `library/{model}_models.py`, `library/{model}_train_utils.py`, `library/{model}_utils.py`
- **Networks**: `networks/lora_{model}.py`, `networks/oft_{model}.py` for adapter training

### Supported Models
- **Stable Diffusion 1.x**: `train*.py`, `library/train_util.py`, `train_db.py` (for DreamBooth)
- **SDXL**: `sdxl_train*.py`, `library/sdxl_*`
- **SD3**: `sd3_train*.py`, `library/sd3_*`
- **FLUX.1**: `flux_train*.py`, `library/flux_*`

### Key Components

#### Memory Management
- **Block swapping**: CPU-GPU memory optimization via `--blocks_to_swap` parameter, works with custom offloading. Only available for models with transformer architectures like SD3 and FLUX.1.
- **Custom offloading**: `library/custom_offloading_utils.py` for advanced memory management
- **Gradient checkpointing**: Memory reduction during training

#### Training Features
- **LoRA training**: Low-rank adaptation networks in `networks/lora*.py`
- **ControlNet training**: Conditional generation control
- **Textual Inversion**: Custom embedding training
- **Multi-resolution training**: Bucket-based aspect ratio handling
- **Validation loss**: Real-time training monitoring, only for LoRA training

#### Configuration System
Dataset configuration uses TOML files with structured validation:
```toml
[datasets.sample_dataset]
resolution = 1024
batch_size = 2

[[datasets.sample_dataset.subsets]]
image_dir = "path/to/images"
caption_extension = ".txt"
```

## Common Development Commands

### Training Commands Pattern
All training scripts follow this general pattern:
```bash
accelerate launch --mixed_precision bf16 {script_name}.py \
--pretrained_model_name_or_path model.safetensors \
--dataset_config config.toml \
--output_dir output \
--output_name model_name \
[model-specific options]
```

### Memory Optimization
For low VRAM environments, use block swapping:
```bash
# Add to any training command for memory reduction
--blocks_to_swap 10 # Swap 10 blocks to CPU (adjust number as needed)
```

### Utility Scripts
Located in `tools/` directory:
- `tools/merge_lora.py`: Merge LoRA weights into base models
- `tools/cache_latents.py`: Pre-cache VAE latents for faster training
- `tools/cache_text_encoder_outputs.py`: Pre-cache text encoder outputs

## Development Notes

### Strategy Pattern Implementation
When adding support for new models, implement the four core strategies:
1. `TokenizeStrategy`: Text tokenization handling
2. `TextEncodingStrategy`: Text encoder forward pass
3. `LatentsCachingStrategy`: VAE encoding/caching
4. `TextEncoderOutputsCachingStrategy`: Text encoder output caching

### Testing Approach
- Unit tests focus on utility functions and model loading
- Integration tests validate training script syntax and basic execution
- Most tests use mocks to avoid requiring actual model files
- Add tests for new model support in `tests/test_{model}_*.py`

### Configuration System
- Use `config_util.py` dataclasses for type-safe configuration
- Support both command-line arguments and TOML file configuration
- Validate configuration early in training scripts to prevent runtime errors

### Memory Management
- Always consider VRAM limitations when implementing features
- Use gradient checkpointing for large models
- Implement block swapping for models with transformer architectures
- Cache intermediate results (latents, text embeddings) when possible
9 changes: 9 additions & 0 deletions .ai/gemini.prompt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## About This File

This file provides guidance to Gemini CLI (https://github.com/google-gemini/gemini-cli) when working with code in this repository.

## 1. Project Context
Here is the essential context for our project. Please read and understand it thoroughly.

### Project Overview
@./context/01-overview.md
51 changes: 51 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: Test with pytest

on:
push:
branches:
- main
- dev
- sd3
pull_request:
branches:
- main
- dev
- sd3

# CKV2_GHA_1: "Ensure top-level permissions are not set to write-all"
permissions: read-all

jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: ["3.10"] # Python versions to test
pytorch-version: ["2.4.0", "2.6.0"] # PyTorch versions to test

steps:
- uses: actions/checkout@v4
with:
# https://woodruffw.github.io/zizmor/audits/#artipacked
persist-credentials: false

- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'

- name: Install and update pip, setuptools, wheel
run: |
# Setuptools, wheel for compiling some packages
python -m pip install --upgrade pip setuptools wheel

- name: Install dependencies
run: |
# Pre-install torch to pin version (requirements.txt has dependencies like transformers which requires pytorch)
pip install dadaptation==3.2 torch==${{ matrix.pytorch-version }} torchvision pytest==8.3.4
pip install -r requirements.txt

- name: Test with pytest
run: pytest # See pytest.ini for configuration

14 changes: 11 additions & 3 deletions .github/workflows/typos.yml
Original file line number Diff line number Diff line change
@@ -1,21 +1,29 @@
---
# yamllint disable rule:line-length
name: Typos

on: # yamllint disable-line rule:truthy
on:
push:
branches:
- main
- dev
pull_request:
types:
- opened
- synchronize
- reopened

# CKV2_GHA_1: "Ensure top-level permissions are not set to write-all"
permissions: read-all

jobs:
build:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
with:
# https://woodruffw.github.io/zizmor/audits/#artipacked
persist-credentials: false

- name: typos-action
uses: crate-ci/typos@v1.24.3
uses: crate-ci/typos@v1.28.1
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ CLAUDE.md
GEMINI.md
.claude
.gemini
MagicMock
MagicMock
13 changes: 7 additions & 6 deletions README-ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,11 +167,12 @@ masterpiece, best quality, 1boy, in business suit, standing at street, looking b

`#` で始まる行はコメントになります。`--n` のように「ハイフン二個+英小文字」の形でオプションを指定できます。以下が使用可能できます。

* `--n` Negative prompt up to the next option.
* `--w` Specifies the width of the generated image.
* `--h` Specifies the height of the generated image.
* `--d` Specifies the seed of the generated image.
* `--l` Specifies the CFG scale of the generated image.
* `--s` Specifies the number of steps in the generation.
* `--n` ネガティブプロンプト(次のオプションまで)
* `--w` 生成画像の幅を指定
* `--h` 生成画像の高さを指定
* `--d` 生成画像のシード値を指定
* `--l` 生成画像のCFGスケールを指定。FLUX.1モデルでは、デフォルトは `1.0` でCFGなしを意味します。Chromaモデルでは、CFGを有効にするために `4.0` 程度に設定してください
* `--g` 埋め込みガイダンス付きモデル(FLUX.1)の埋め込みガイダンススケールを指定、デフォルトは `3.5`。Chromaモデルでは `0.0` に設定してください
* `--s` 生成時のステップ数を指定

`( )` や `[ ]` などの重みづけも動作します。
95 changes: 90 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,81 @@
This repository contains training, generation and utility scripts for Stable Diffusion.

## FLUX.1 and SD3 training (WIP)

This feature is experimental. The options and the training script may change in the future. Please let us know if you have any idea to improve the training.

__Please update PyTorch to 2.6.0 or later. We have tested with `torch==2.6.0` and `torchvision==0.21.0` with CUDA 12.4. `requirements.txt` is also updated, so please update the requirements.__

The command to install PyTorch is as follows:
`pip3 install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124`

For RTX 50 series GPUs, PyTorch 2.8.0 with CUDA 12.8/9 should be used. `requirements.txt` will work with this version.

If you are using DeepSpeed, please install DeepSpeed with `pip install deepspeed` (appropriate version is not confirmed yet).

### Recent Updates

Sep 23, 2025:
- HunyuanImage-2.1 LoRA training is supported. [PR #2198](https://github.com/kohya-ss/sd-scripts/pull/2198) for details.
- Please see [HunyuanImage-2.1 Training](./docs/hunyuan_image_train_network.md) for details.
- __HunyuanImage-2.1 training does not support LoRA modules for Text Encoders, so `--network_train_unet_only` is required.__
- The training script is `hunyuan_image_train_network.py`.
- This includes changes to `train_network.py`, the base of the training script. Please let us know if you encounter any issues.

Sep 13, 2025:
- The loading speed of `.safetensors` files has been improved for SD3, FLUX.1 and Lumina. See [PR #2200](https://github.com/kohya-ss/sd-scripts/pull/2200) for more details.
- Model loading can be up to 1.5 times faster.
- This is a wide-ranging update, so there may be bugs. Please let us know if you encounter any issues.

Sep 4, 2025:
- The information about FLUX.1 and SD3/SD3.5 training that was described in the README has been organized and divided into the following documents:
- [LoRA Training Overview](./docs/train_network.md)
- [SDXL Training](./docs/sdxl_train_network.md)
- [Advanced Training](./docs/train_network_advanced.md)
- [FLUX.1 Training](./docs/flux_train_network.md)
- [SD3 Training](./docs/sd3_train_network.md)
- [LUMINA Training](./docs/lumina_train_network.md)
- [Validation](./docs/validation.md)
- [Fine-tuning](./docs/fine_tune.md)
- [Textual Inversion Training](./docs/train_textual_inversion.md)

Aug 28, 2025:
- In order to support the latest GPUs and features, we have updated the **PyTorch and library versions**. PR [#2178](https://github.com/kohya-ss/sd-scripts/pull/2178) There are many changes, so please let us know if you encounter any issues.
- The PyTorch version used for testing has been updated to 2.6.0. We have confirmed that it works with PyTorch 2.6.0 and later.
- The `requirements.txt` has been updated, so please update your dependencies.
- You can update the dependencies with `pip install -r requirements.txt`.
- The version specification for `bitsandbytes` has been removed. If you encounter errors on RTX 50 series GPUs, please update it with `pip install -U bitsandbytes`.
- We have modified each script to minimize warnings as much as possible.
- The modified scripts will work in the old environment (library versions), but please update them when convenient.


## For Developers Using AI Coding Agents

This repository provides recommended instructions to help AI agents like Claude and Gemini understand our project context and coding standards.

To use them, you need to opt-in by creating your own configuration file in the project root.

**Quick Setup:**

1. Create a `CLAUDE.md` and/or `GEMINI.md` file in the project root.
2. Add the following line to your `CLAUDE.md` to import the repository's recommended prompt:

```markdown
@./.ai/claude.prompt.md
```

or for Gemini:

```markdown
@./.ai/gemini.prompt.md
```

3. You can now add your own personal instructions below the import line (e.g., `Always respond in Japanese.`).

This approach ensures that you have full control over the instructions given to your agent while benefiting from the shared project context. Your `CLAUDE.md` and `GEMINI.md` are already listed in `.gitignore`, so it won't be committed to the repository.

---

[__Change History__](#change-history) is moved to the bottom of the page.
更新履歴は[ページ末尾](#change-history)に移しました。

Expand Down Expand Up @@ -125,6 +201,14 @@ Note: Some user reports ``ValueError: fp16 mixed precision requires a GPU`` is o

(Single GPU with id `0` will be used.)

## DeepSpeed installation (experimental, Linux or WSL2 only)

To install DeepSpeed, run the following command in your activated virtual environment:

```bash
pip install deepspeed==0.16.7
```

## Upgrade

When a new release comes out you can upgrade your repo with the following command:
Expand Down Expand Up @@ -226,7 +310,7 @@ The majority of scripts is licensed under ASL 2.0 (including codes from Diffuser

- Fused optimizer is available for SDXL training. PR [#1259](https://github.com/kohya-ss/sd-scripts/pull/1259) Thanks to 2kpr!
- The memory usage during training is significantly reduced by integrating the optimizer's backward pass with step. The training results are the same as before, but if you have plenty of memory, the speed will be slower.
- Specify the `--fused_backward_pass` option in `sdxl_train.py`. At this time, only AdaFactor is supported. Gradient accumulation is not available.
- Specify the `--fused_backward_pass` option in `sdxl_train.py`. At this time, only Adafactor is supported. Gradient accumulation is not available.
- Setting mixed precision to `no` seems to use less memory than `fp16` or `bf16`.
- Training is possible with a memory usage of about 17GB with a batch size of 1 and fp32. If you specify the `--full_bf16` option, you can further reduce the memory usage (but the accuracy will be lower). With the same memory usage as before, you can increase the batch size.
- PyTorch 2.1 or later is required because it uses the new API `Tensor.register_post_accumulate_grad_hook(hook)`.
Expand All @@ -236,7 +320,7 @@ The majority of scripts is licensed under ASL 2.0 (including codes from Diffuser
- Memory usage is reduced by the same principle as Fused optimizer. The training results and speed are the same as Fused optimizer.
- Specify the number of groups like `--fused_optimizer_groups 10` in `sdxl_train.py`. Increasing the number of groups reduces memory usage but slows down training. Since the effect is limited to a certain number, it is recommended to specify 4-10.
- Any optimizer can be used, but optimizers that automatically calculate the learning rate (such as D-Adaptation and Prodigy) cannot be used. Gradient accumulation is not available.
- `--fused_optimizer_groups` cannot be used with `--fused_backward_pass`. When using AdaFactor, the memory usage is slightly larger than with Fused optimizer. PyTorch 2.1 or later is required.
- `--fused_optimizer_groups` cannot be used with `--fused_backward_pass`. When using Adafactor, the memory usage is slightly larger than with Fused optimizer. PyTorch 2.1 or later is required.
- Mechanism: While Fused optimizer performs backward/step for individual parameters within the optimizer, optimizer groups reduce memory usage by grouping parameters and creating multiple optimizers to perform backward/step for each group. Fused optimizer requires implementation on the optimizer side, while optimizer groups are implemented only on the training script side.

- LoRA+ is supported. PR [#1233](https://github.com/kohya-ss/sd-scripts/pull/1233) Thanks to rockerBOO!
Expand Down Expand Up @@ -295,7 +379,7 @@ https://github.com/kohya-ss/sd-scripts/pull/1290) Thanks to frodo821!

- SDXL の学習時に Fused optimizer が使えるようになりました。PR [#1259](https://github.com/kohya-ss/sd-scripts/pull/1259) 2kpr 氏に感謝します。
- optimizer の backward pass に step を統合することで学習時のメモリ使用量を大きく削減します。学習結果は未適用時と同一ですが、メモリが潤沢にある場合は速度は遅くなります。
- `sdxl_train.py` に `--fused_backward_pass` オプションを指定してください。現時点では optimizer は AdaFactor のみ対応しています。また gradient accumulation は使えません。
- `sdxl_train.py` に `--fused_backward_pass` オプションを指定してください。現時点では optimizer は Adafactor のみ対応しています。また gradient accumulation は使えません。
- mixed precision は `no` のほうが `fp16` や `bf16` よりも使用メモリ量が少ないようです。
- バッチサイズ 1、fp32 で 17GB 程度で学習可能なようです。`--full_bf16` オプションを指定するとさらに削減できます(精度は劣ります)。以前と同じメモリ使用量ではバッチサイズを増やせます。
- PyTorch 2.1 以降の新 API `Tensor.register_post_accumulate_grad_hook(hook)` を使用しているため、PyTorch 2.1 以降が必要です。
Expand Down Expand Up @@ -599,11 +683,12 @@ masterpiece, best quality, 1boy, in business suit, standing at street, looking b

Lines beginning with `#` are comments. You can specify options for the generated image with options like `--n` after the prompt. The following can be used.

* `--n` Negative prompt up to the next option.
* `--n` Negative prompt up to the next option. Ignored when CFG scale is `1.0`.
* `--w` Specifies the width of the generated image.
* `--h` Specifies the height of the generated image.
* `--d` Specifies the seed of the generated image.
* `--l` Specifies the CFG scale of the generated image.
* `--l` Specifies the CFG scale of the generated image. For FLUX.1 models, the default is `1.0`, which means no CFG. For Chroma models, set to around `4.0` to enable CFG.
* `--g` Specifies the embedded guidance scale for the models with embedded guidance (FLUX.1), the default is `3.5`. Set to `0.0` for Chroma models.
* `--s` Specifies the number of steps in the generation.

The prompt weighting such as `( )` and `[ ]` are working.
4 changes: 3 additions & 1 deletion _typos.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,9 @@ koo="koo"
yos="yos"
wn="wn"
hime="hime"

OT="OT"
byt="byt"
tak="tak"

[files]
extend-exclude = ["_typos.toml", "venv"]
Loading