diff --git a/.ai/claude.prompt.md b/.ai/claude.prompt.md
new file mode 100644
index 000000000..7f38f5752
--- /dev/null
+++ b/.ai/claude.prompt.md
@@ -0,0 +1,9 @@
+## About This File
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## 1. Project Context
+Here is the essential context for our project. Please read and understand it thoroughly.
+
+### Project Overview
+@./context/01-overview.md
diff --git a/.ai/context/01-overview.md b/.ai/context/01-overview.md
new file mode 100644
index 000000000..41133e983
--- /dev/null
+++ b/.ai/context/01-overview.md
@@ -0,0 +1,101 @@
+This file provides the overview and guidance for developers working with the codebase, including setup instructions, architecture details, and common commands.
+
+## Project Architecture
+
+### Core Training Framework
+The codebase is built around a **strategy pattern architecture** that supports multiple diffusion model families:
+
+- **`library/strategy_base.py`**: Base classes for tokenization, text encoding, latent caching, and training strategies
+- **`library/strategy_*.py`**: Model-specific implementations for SD, SDXL, SD3, FLUX, etc.
+- **`library/train_util.py`**: Core training utilities shared across all model types
+- **`library/config_util.py`**: Configuration management with TOML support
+
+### Model Support Structure
+Each supported model family has a consistent structure:
+- **Training script**: `{model}_train.py` (full fine-tuning), `{model}_train_network.py` (LoRA/network training)
+- **Model utilities**: `library/{model}_models.py`, `library/{model}_train_utils.py`, `library/{model}_utils.py`
+- **Networks**: `networks/lora_{model}.py`, `networks/oft_{model}.py` for adapter training
+
+### Supported Models
+- **Stable Diffusion 1.x**: `train*.py`, `library/train_util.py`, `train_db.py` (for DreamBooth)
+- **SDXL**: `sdxl_train*.py`, `library/sdxl_*`
+- **SD3**: `sd3_train*.py`, `library/sd3_*`
+- **FLUX.1**: `flux_train*.py`, `library/flux_*`
+
+### Key Components
+
+#### Memory Management
+- **Block swapping**: CPU-GPU memory optimization via `--blocks_to_swap` parameter, works with custom offloading. Only available for models with transformer architectures like SD3 and FLUX.1.
+- **Custom offloading**: `library/custom_offloading_utils.py` for advanced memory management
+- **Gradient checkpointing**: Memory reduction during training
+
+#### Training Features
+- **LoRA training**: Low-rank adaptation networks in `networks/lora*.py`
+- **ControlNet training**: Conditional generation control
+- **Textual Inversion**: Custom embedding training
+- **Multi-resolution training**: Bucket-based aspect ratio handling
+- **Validation loss**: Real-time training monitoring, only for LoRA training
+
+#### Configuration System
+Dataset configuration uses TOML files with structured validation:
+```toml
+[datasets.sample_dataset]
+ resolution = 1024
+ batch_size = 2
+
+ [[datasets.sample_dataset.subsets]]
+ image_dir = "path/to/images"
+ caption_extension = ".txt"
+```
+
+## Common Development Commands
+
+### Training Commands Pattern
+All training scripts follow this general pattern:
+```bash
+accelerate launch --mixed_precision bf16 {script_name}.py \
+ --pretrained_model_name_or_path model.safetensors \
+ --dataset_config config.toml \
+ --output_dir output \
+ --output_name model_name \
+ [model-specific options]
+```
+
+### Memory Optimization
+For low VRAM environments, use block swapping:
+```bash
+# Add to any training command for memory reduction
+--blocks_to_swap 10 # Swap 10 blocks to CPU (adjust number as needed)
+```
+
+### Utility Scripts
+Located in `tools/` directory:
+- `tools/merge_lora.py`: Merge LoRA weights into base models
+- `tools/cache_latents.py`: Pre-cache VAE latents for faster training
+- `tools/cache_text_encoder_outputs.py`: Pre-cache text encoder outputs
+
+## Development Notes
+
+### Strategy Pattern Implementation
+When adding support for new models, implement the four core strategies:
+1. `TokenizeStrategy`: Text tokenization handling
+2. `TextEncodingStrategy`: Text encoder forward pass
+3. `LatentsCachingStrategy`: VAE encoding/caching
+4. `TextEncoderOutputsCachingStrategy`: Text encoder output caching
+
+### Testing Approach
+- Unit tests focus on utility functions and model loading
+- Integration tests validate training script syntax and basic execution
+- Most tests use mocks to avoid requiring actual model files
+- Add tests for new model support in `tests/test_{model}_*.py`
+
+### Configuration System
+- Use `config_util.py` dataclasses for type-safe configuration
+- Support both command-line arguments and TOML file configuration
+- Validate configuration early in training scripts to prevent runtime errors
+
+### Memory Management
+- Always consider VRAM limitations when implementing features
+- Use gradient checkpointing for large models
+- Implement block swapping for models with transformer architectures
+- Cache intermediate results (latents, text embeddings) when possible
\ No newline at end of file
diff --git a/.ai/gemini.prompt.md b/.ai/gemini.prompt.md
new file mode 100644
index 000000000..6047390bc
--- /dev/null
+++ b/.ai/gemini.prompt.md
@@ -0,0 +1,9 @@
+## About This File
+
+This file provides guidance to Gemini CLI (https://github.com/google-gemini/gemini-cli) when working with code in this repository.
+
+## 1. Project Context
+Here is the essential context for our project. Please read and understand it thoroughly.
+
+### Project Overview
+@./context/01-overview.md
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
new file mode 100644
index 000000000..d35fe3925
--- /dev/null
+++ b/.github/workflows/tests.yml
@@ -0,0 +1,51 @@
+name: Test with pytest
+
+on:
+ push:
+ branches:
+ - main
+ - dev
+ - sd3
+ pull_request:
+ branches:
+ - main
+ - dev
+ - sd3
+
+# CKV2_GHA_1: "Ensure top-level permissions are not set to write-all"
+permissions: read-all
+
+jobs:
+ build:
+ runs-on: ${{ matrix.os }}
+ strategy:
+ matrix:
+ os: [ubuntu-latest]
+ python-version: ["3.10"] # Python versions to test
+ pytorch-version: ["2.4.0", "2.6.0"] # PyTorch versions to test
+
+ steps:
+ - uses: actions/checkout@v4
+ with:
+ # https://woodruffw.github.io/zizmor/audits/#artipacked
+ persist-credentials: false
+
+ - uses: actions/setup-python@v5
+ with:
+ python-version: ${{ matrix.python-version }}
+ cache: 'pip'
+
+ - name: Install and update pip, setuptools, wheel
+ run: |
+ # Setuptools, wheel for compiling some packages
+ python -m pip install --upgrade pip setuptools wheel
+
+ - name: Install dependencies
+ run: |
+ # Pre-install torch to pin version (requirements.txt has dependencies like transformers which requires pytorch)
+ pip install dadaptation==3.2 torch==${{ matrix.pytorch-version }} torchvision pytest==8.3.4
+ pip install -r requirements.txt
+
+ - name: Test with pytest
+ run: pytest # See pytest.ini for configuration
+
diff --git a/.github/workflows/typos.yml b/.github/workflows/typos.yml
index 0149dcdd3..b9d6acc98 100644
--- a/.github/workflows/typos.yml
+++ b/.github/workflows/typos.yml
@@ -1,21 +1,29 @@
---
-# yamllint disable rule:line-length
name: Typos
-on: # yamllint disable-line rule:truthy
+on:
push:
+ branches:
+ - main
+ - dev
pull_request:
types:
- opened
- synchronize
- reopened
+# CKV2_GHA_1: "Ensure top-level permissions are not set to write-all"
+permissions: read-all
+
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
+ with:
+ # https://woodruffw.github.io/zizmor/audits/#artipacked
+ persist-credentials: false
- name: typos-action
- uses: crate-ci/typos@v1.24.3
+ uses: crate-ci/typos@v1.28.1
diff --git a/.gitignore b/.gitignore
index d48110130..cfdc02685 100644
--- a/.gitignore
+++ b/.gitignore
@@ -10,4 +10,4 @@ CLAUDE.md
GEMINI.md
.claude
.gemini
-MagicMock
\ No newline at end of file
+MagicMock
diff --git a/README-ja.md b/README-ja.md
index 71c3b0d54..27e15aa94 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -167,11 +167,12 @@ masterpiece, best quality, 1boy, in business suit, standing at street, looking b
`#` で始まる行はコメントになります。`--n` のように「ハイフン二個+英小文字」の形でオプションを指定できます。以下が使用可能できます。
- * `--n` Negative prompt up to the next option.
- * `--w` Specifies the width of the generated image.
- * `--h` Specifies the height of the generated image.
- * `--d` Specifies the seed of the generated image.
- * `--l` Specifies the CFG scale of the generated image.
- * `--s` Specifies the number of steps in the generation.
+ * `--n` ネガティブプロンプト(次のオプションまで)
+ * `--w` 生成画像の幅を指定
+ * `--h` 生成画像の高さを指定
+ * `--d` 生成画像のシード値を指定
+ * `--l` 生成画像のCFGスケールを指定。FLUX.1モデルでは、デフォルトは `1.0` でCFGなしを意味します。Chromaモデルでは、CFGを有効にするために `4.0` 程度に設定してください
+ * `--g` 埋め込みガイダンス付きモデル(FLUX.1)の埋め込みガイダンススケールを指定、デフォルトは `3.5`。Chromaモデルでは `0.0` に設定してください
+ * `--s` 生成時のステップ数を指定
`( )` や `[ ]` などの重みづけも動作します。
diff --git a/README.md b/README.md
index 629f1d415..c70dc257d 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,81 @@
This repository contains training, generation and utility scripts for Stable Diffusion.
+## FLUX.1 and SD3 training (WIP)
+
+This feature is experimental. The options and the training script may change in the future. Please let us know if you have any idea to improve the training.
+
+__Please update PyTorch to 2.6.0 or later. We have tested with `torch==2.6.0` and `torchvision==0.21.0` with CUDA 12.4. `requirements.txt` is also updated, so please update the requirements.__
+
+The command to install PyTorch is as follows:
+`pip3 install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124`
+
+For RTX 50 series GPUs, PyTorch 2.8.0 with CUDA 12.8/9 should be used. `requirements.txt` will work with this version.
+
+If you are using DeepSpeed, please install DeepSpeed with `pip install deepspeed` (appropriate version is not confirmed yet).
+
+### Recent Updates
+
+Sep 23, 2025:
+- HunyuanImage-2.1 LoRA training is supported. [PR #2198](https://github.com/kohya-ss/sd-scripts/pull/2198) for details.
+ - Please see [HunyuanImage-2.1 Training](./docs/hunyuan_image_train_network.md) for details.
+ - __HunyuanImage-2.1 training does not support LoRA modules for Text Encoders, so `--network_train_unet_only` is required.__
+ - The training script is `hunyuan_image_train_network.py`.
+ - This includes changes to `train_network.py`, the base of the training script. Please let us know if you encounter any issues.
+
+Sep 13, 2025:
+- The loading speed of `.safetensors` files has been improved for SD3, FLUX.1 and Lumina. See [PR #2200](https://github.com/kohya-ss/sd-scripts/pull/2200) for more details.
+ - Model loading can be up to 1.5 times faster.
+ - This is a wide-ranging update, so there may be bugs. Please let us know if you encounter any issues.
+
+Sep 4, 2025:
+- The information about FLUX.1 and SD3/SD3.5 training that was described in the README has been organized and divided into the following documents:
+ - [LoRA Training Overview](./docs/train_network.md)
+ - [SDXL Training](./docs/sdxl_train_network.md)
+ - [Advanced Training](./docs/train_network_advanced.md)
+ - [FLUX.1 Training](./docs/flux_train_network.md)
+ - [SD3 Training](./docs/sd3_train_network.md)
+ - [LUMINA Training](./docs/lumina_train_network.md)
+ - [Validation](./docs/validation.md)
+ - [Fine-tuning](./docs/fine_tune.md)
+ - [Textual Inversion Training](./docs/train_textual_inversion.md)
+
+Aug 28, 2025:
+- In order to support the latest GPUs and features, we have updated the **PyTorch and library versions**. PR [#2178](https://github.com/kohya-ss/sd-scripts/pull/2178) There are many changes, so please let us know if you encounter any issues.
+- The PyTorch version used for testing has been updated to 2.6.0. We have confirmed that it works with PyTorch 2.6.0 and later.
+- The `requirements.txt` has been updated, so please update your dependencies.
+ - You can update the dependencies with `pip install -r requirements.txt`.
+ - The version specification for `bitsandbytes` has been removed. If you encounter errors on RTX 50 series GPUs, please update it with `pip install -U bitsandbytes`.
+- We have modified each script to minimize warnings as much as possible.
+ - The modified scripts will work in the old environment (library versions), but please update them when convenient.
+
+
+## For Developers Using AI Coding Agents
+
+This repository provides recommended instructions to help AI agents like Claude and Gemini understand our project context and coding standards.
+
+To use them, you need to opt-in by creating your own configuration file in the project root.
+
+**Quick Setup:**
+
+1. Create a `CLAUDE.md` and/or `GEMINI.md` file in the project root.
+2. Add the following line to your `CLAUDE.md` to import the repository's recommended prompt:
+
+ ```markdown
+ @./.ai/claude.prompt.md
+ ```
+
+ or for Gemini:
+
+ ```markdown
+ @./.ai/gemini.prompt.md
+ ```
+
+3. You can now add your own personal instructions below the import line (e.g., `Always respond in Japanese.`).
+
+This approach ensures that you have full control over the instructions given to your agent while benefiting from the shared project context. Your `CLAUDE.md` and `GEMINI.md` are already listed in `.gitignore`, so it won't be committed to the repository.
+
+---
+
[__Change History__](#change-history) is moved to the bottom of the page.
更新履歴は[ページ末尾](#change-history)に移しました。
@@ -125,6 +201,14 @@ Note: Some user reports ``ValueError: fp16 mixed precision requires a GPU`` is o
(Single GPU with id `0` will be used.)
+## DeepSpeed installation (experimental, Linux or WSL2 only)
+
+To install DeepSpeed, run the following command in your activated virtual environment:
+
+```bash
+pip install deepspeed==0.16.7
+```
+
## Upgrade
When a new release comes out you can upgrade your repo with the following command:
@@ -226,7 +310,7 @@ The majority of scripts is licensed under ASL 2.0 (including codes from Diffuser
- Fused optimizer is available for SDXL training. PR [#1259](https://github.com/kohya-ss/sd-scripts/pull/1259) Thanks to 2kpr!
- The memory usage during training is significantly reduced by integrating the optimizer's backward pass with step. The training results are the same as before, but if you have plenty of memory, the speed will be slower.
- - Specify the `--fused_backward_pass` option in `sdxl_train.py`. At this time, only AdaFactor is supported. Gradient accumulation is not available.
+ - Specify the `--fused_backward_pass` option in `sdxl_train.py`. At this time, only Adafactor is supported. Gradient accumulation is not available.
- Setting mixed precision to `no` seems to use less memory than `fp16` or `bf16`.
- Training is possible with a memory usage of about 17GB with a batch size of 1 and fp32. If you specify the `--full_bf16` option, you can further reduce the memory usage (but the accuracy will be lower). With the same memory usage as before, you can increase the batch size.
- PyTorch 2.1 or later is required because it uses the new API `Tensor.register_post_accumulate_grad_hook(hook)`.
@@ -236,7 +320,7 @@ The majority of scripts is licensed under ASL 2.0 (including codes from Diffuser
- Memory usage is reduced by the same principle as Fused optimizer. The training results and speed are the same as Fused optimizer.
- Specify the number of groups like `--fused_optimizer_groups 10` in `sdxl_train.py`. Increasing the number of groups reduces memory usage but slows down training. Since the effect is limited to a certain number, it is recommended to specify 4-10.
- Any optimizer can be used, but optimizers that automatically calculate the learning rate (such as D-Adaptation and Prodigy) cannot be used. Gradient accumulation is not available.
- - `--fused_optimizer_groups` cannot be used with `--fused_backward_pass`. When using AdaFactor, the memory usage is slightly larger than with Fused optimizer. PyTorch 2.1 or later is required.
+ - `--fused_optimizer_groups` cannot be used with `--fused_backward_pass`. When using Adafactor, the memory usage is slightly larger than with Fused optimizer. PyTorch 2.1 or later is required.
- Mechanism: While Fused optimizer performs backward/step for individual parameters within the optimizer, optimizer groups reduce memory usage by grouping parameters and creating multiple optimizers to perform backward/step for each group. Fused optimizer requires implementation on the optimizer side, while optimizer groups are implemented only on the training script side.
- LoRA+ is supported. PR [#1233](https://github.com/kohya-ss/sd-scripts/pull/1233) Thanks to rockerBOO!
@@ -295,7 +379,7 @@ https://github.com/kohya-ss/sd-scripts/pull/1290) Thanks to frodo821!
- SDXL の学習時に Fused optimizer が使えるようになりました。PR [#1259](https://github.com/kohya-ss/sd-scripts/pull/1259) 2kpr 氏に感謝します。
- optimizer の backward pass に step を統合することで学習時のメモリ使用量を大きく削減します。学習結果は未適用時と同一ですが、メモリが潤沢にある場合は速度は遅くなります。
- - `sdxl_train.py` に `--fused_backward_pass` オプションを指定してください。現時点では optimizer は AdaFactor のみ対応しています。また gradient accumulation は使えません。
+ - `sdxl_train.py` に `--fused_backward_pass` オプションを指定してください。現時点では optimizer は Adafactor のみ対応しています。また gradient accumulation は使えません。
- mixed precision は `no` のほうが `fp16` や `bf16` よりも使用メモリ量が少ないようです。
- バッチサイズ 1、fp32 で 17GB 程度で学習可能なようです。`--full_bf16` オプションを指定するとさらに削減できます(精度は劣ります)。以前と同じメモリ使用量ではバッチサイズを増やせます。
- PyTorch 2.1 以降の新 API `Tensor.register_post_accumulate_grad_hook(hook)` を使用しているため、PyTorch 2.1 以降が必要です。
@@ -599,11 +683,12 @@ masterpiece, best quality, 1boy, in business suit, standing at street, looking b
Lines beginning with `#` are comments. You can specify options for the generated image with options like `--n` after the prompt. The following can be used.
- * `--n` Negative prompt up to the next option.
+ * `--n` Negative prompt up to the next option. Ignored when CFG scale is `1.0`.
* `--w` Specifies the width of the generated image.
* `--h` Specifies the height of the generated image.
* `--d` Specifies the seed of the generated image.
- * `--l` Specifies the CFG scale of the generated image.
+ * `--l` Specifies the CFG scale of the generated image. For FLUX.1 models, the default is `1.0`, which means no CFG. For Chroma models, set to around `4.0` to enable CFG.
+ * `--g` Specifies the embedded guidance scale for the models with embedded guidance (FLUX.1), the default is `3.5`. Set to `0.0` for Chroma models.
* `--s` Specifies the number of steps in the generation.
The prompt weighting such as `( )` and `[ ]` are working.
diff --git a/_typos.toml b/_typos.toml
index bbf7728f4..686da4af2 100644
--- a/_typos.toml
+++ b/_typos.toml
@@ -29,7 +29,9 @@ koo="koo"
yos="yos"
wn="wn"
hime="hime"
-
+OT="OT"
+byt="byt"
+tak="tak"
[files]
extend-exclude = ["_typos.toml", "venv"]
diff --git a/docs/config_README-en.md b/docs/config_README-en.md
index 66a50dc09..78687ee6c 100644
--- a/docs/config_README-en.md
+++ b/docs/config_README-en.md
@@ -1,9 +1,6 @@
-Original Source by kohya-ss
+First version: A.I Translation by Model: NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO, editing by Darkstorm2150
-First version:
-A.I Translation by Model: NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO, editing by Darkstorm2150
-
-Some parts are manually added.
+Document is updated and maintained manually.
# Config Readme
@@ -152,6 +149,7 @@ These options are related to subset configuration.
| `keep_tokens_separator` | `“|||”` | o | o | o |
| `secondary_separator` | `“;;;”` | o | o | o |
| `enable_wildcard` | `true` | o | o | o |
+| `resize_interpolation` | (not specified) | o | o | o |
* `num_repeats`
* Specifies the number of repeats for images in a subset. This is equivalent to `--dataset_repeats` in fine-tuning but can be specified for any training method.
@@ -165,6 +163,8 @@ These options are related to subset configuration.
* Specifies an additional separator. The part separated by this separator is treated as one tag and is shuffled and dropped. It is then replaced by `caption_separator`. For example, if you specify `aaa;;;bbb;;;ccc`, it will be replaced by `aaa,bbb,ccc` or dropped together.
* `enable_wildcard`
* Enables wildcard notation. This will be explained later.
+* `resize_interpolation`
+ * Specifies the interpolation method used when resizing images. Normally, there is no need to specify this. The following options can be specified: `lanczos`, `nearest`, `bilinear`, `linear`, `bicubic`, `cubic`, `area`, `box`. By default (when not specified), `area` is used for downscaling, and `lanczos` is used for upscaling. If this option is specified, the same interpolation method will be used for both upscaling and downscaling. When `lanczos` or `box` is specified, PIL is used; for other options, OpenCV is used.
### DreamBooth-specific options
@@ -264,10 +264,10 @@ The following command line argument options are ignored if a configuration file
* `--reg_data_dir`
* `--in_json`
-The following command line argument options are given priority over the configuration file options if both are specified simultaneously. In most cases, they have the same names as the corresponding options in the configuration file.
+For the command line options listed below, if an option is specified in both the command line arguments and the configuration file, the value from the configuration file will be given priority. Unless otherwise noted, the option names are the same.
-| Command Line Argument Option | Prioritized Configuration File Option |
-| ------------------------------- | ------------------------------------- |
+| Command Line Argument Option | Corresponding Configuration File Option |
+| ------------------------------- | --------------------------------------- |
| `--bucket_no_upscale` | |
| `--bucket_reso_steps` | |
| `--caption_dropout_every_n_epochs` | |
diff --git a/docs/config_README-ja.md b/docs/config_README-ja.md
index 0ed95e0eb..aec0eca5d 100644
--- a/docs/config_README-ja.md
+++ b/docs/config_README-ja.md
@@ -144,6 +144,7 @@ DreamBooth の手法と fine tuning の手法の両方とも利用可能な学
| `keep_tokens_separator` | `“|||”` | o | o | o |
| `secondary_separator` | `“;;;”` | o | o | o |
| `enable_wildcard` | `true` | o | o | o |
+| `resize_interpolation` |(通常は設定しません) | o | o | o |
* `num_repeats`
* サブセットの画像の繰り返し回数を指定します。fine tuning における `--dataset_repeats` に相当しますが、`num_repeats` はどの学習方法でも指定可能です。
@@ -162,6 +163,9 @@ DreamBooth の手法と fine tuning の手法の両方とも利用可能な学
* `enable_wildcard`
* ワイルドカード記法および複数行キャプションを有効にします。ワイルドカード記法、複数行キャプションについては後述します。
+* `resize_interpolation`
+ * 画像のリサイズ時に使用する補間方法を指定します。通常は指定しなくて構いません。`lanczos`, `nearest`, `bilinear`, `linear`, `bicubic`, `cubic`, `area`, `box` が指定可能です。デフォルト(未指定時)は、縮小時は `area`、拡大時は `lanczos` になります。このオプションを指定すると、拡大時・縮小時とも同じ補間方法が使用されます。`lanczos`、`box`を指定するとPILが、それ以外を指定するとOpenCVが使用されます。
+
### DreamBooth 方式専用のオプション
DreamBooth 方式のオプションは、サブセット向けオプションのみ存在します。
diff --git a/docs/fine_tune.md b/docs/fine_tune.md
new file mode 100644
index 000000000..1560fb28a
--- /dev/null
+++ b/docs/fine_tune.md
@@ -0,0 +1,347 @@
+# Fine-tuning Guide
+
+This document explains how to perform fine-tuning on various model architectures using the `*_train.py` scripts.
+
+
+日本語
+
+# Fine-tuning ガイド
+
+このドキュメントでは、`*_train.py` スクリプトを用いた、各種モデルアーキテクチャのFine-tuningの方法について解説します。
+
+
+
+### Difference between Fine-tuning and LoRA tuning
+
+This repository supports two methods for additional model training: **Fine-tuning** and **LoRA (Low-Rank Adaptation)**. Each method has distinct features and advantages.
+
+**Fine-tuning** is a method that retrains all (or most) of the weights of a pre-trained model.
+- **Pros**: It can improve the overall expressive power of the model and is suitable for learning styles or concepts that differ significantly from the original model.
+- **Cons**:
+ - It requires a large amount of VRAM and computational cost.
+ - The saved file size is large (same as the original model).
+ - It is prone to "overfitting," where the model loses the diversity of the original model if over-trained.
+- **Corresponding scripts**: Scripts named `*_train.py`, such as `sdxl_train.py`, `sd3_train.py`, `flux_train.py`, and `lumina_train.py`.
+
+**LoRA tuning** is a method that freezes the model's weights and only trains a small additional network called an "adapter."
+- **Pros**:
+ - It allows for fast training with low VRAM and computational cost.
+ - It is considered resistant to overfitting because it trains fewer weights.
+ - The saved file (LoRA network) is very small, ranging from tens to hundreds of MB, making it easy to manage.
+ - Multiple LoRAs can be used in combination.
+- **Cons**: Since it does not train the entire model, it may not achieve changes as significant as fine-tuning.
+- **Corresponding scripts**: Scripts named `*_train_network.py`, such as `sdxl_train_network.py`, `sd3_train_network.py`, and `flux_train_network.py`.
+
+| Feature | Fine-tuning | LoRA tuning |
+|:---|:---|:---|
+| **Training Target** | All model weights | Additional network (adapter) only |
+| **VRAM/Compute Cost**| High | Low |
+| **Training Time** | Long | Short |
+| **File Size** | Large (several GB) | Small (few MB to hundreds of MB) |
+| **Overfitting Risk** | High | Low |
+| **Suitable Use Case** | Major style changes, concept learning | Adding specific characters or styles |
+
+Generally, it is recommended to start with **LoRA tuning** if you want to add a specific character or style. **Fine-tuning** is a valid option for more fundamental style changes or aiming for a high-quality model.
+
+
+日本語
+
+### Fine-tuningとLoRA学習の違い
+
+このリポジトリでは、モデルの追加学習手法として**Fine-tuning**と**LoRA (Low-Rank Adaptation)**学習の2種類をサポートしています。それぞれの手法には異なる特徴と利点があります。
+
+**Fine-tuning**は、事前学習済みモデルの重み全体(または大部分)を再学習する手法です。
+- **利点**: モデル全体の表現力を向上させることができ、元のモデルから大きく変化した画風やコンセプトの学習に適しています。
+- **欠点**:
+ - 学習には多くのVRAMと計算コストが必要です。
+ - 保存されるファイルサイズが大きくなります(元のモデルと同じサイズ)。
+ - 学習させすぎると、元のモデルが持っていた多様性が失われる「過学習(overfitting)」に陥りやすい傾向があります。
+- **対応スクリプト**: `sdxl_train.py`, `sd3_train.py`, `flux_train.py`, `lumina_train.py` など、`*_train.py` という命名規則のスクリプトが対応します。
+
+**LoRA学習**は、モデルの重みは凍結(固定)したまま、「アダプター」と呼ばれる小さな追加ネットワークのみを学習する手法です。
+- **利点**:
+ - 少ないVRAMと計算コストで高速に学習できます。
+ - 学習する重みが少ないため、過学習に強いとされています。
+ - 保存されるファイル(LoRAネットワーク)は数十〜数百MBと非常に小さく、管理が容易です。
+ - 複数のLoRAを組み合わせて使用することも可能です。
+- **欠点**: モデル全体を学習するわけではないため、Fine-tuningほどの大きな変化は期待できない場合があります。
+- **対応スクリプト**: `sdxl_train_network.py`, `sd3_train_network.py`, `flux_train_network.py` など、`*_train_network.py` という命名規則のスクリプトが対応します。
+
+| 特徴 | Fine-tuning | LoRA学習 |
+|:---|:---|:---|
+| **学習対象** | モデルの全重み | 追加ネットワーク(アダプター)のみ |
+| **VRAM/計算コスト**| 大 | 小 |
+| **学習時間** | 長 | 短 |
+| **ファイルサイズ** | 大(数GB) | 小(数MB〜数百MB) |
+| **過学習リスク** | 高 | 低 |
+| **適した用途** | 大規模な画風変更、コンセプト学習 | 特定のキャラ、画風の追加学習 |
+
+一般的に、特定のキャラクターや画風を追加したい場合は**LoRA学習**から試すことが推奨されます。より根本的な画風の変更や、高品質なモデルを目指す場合は**Fine-tuning**が有効な選択肢となります。
+
+
+
+---
+
+### Fine-tuning for each architecture
+
+Fine-tuning updates the entire weights of the model, so it has different options and considerations than LoRA tuning. This section describes the fine-tuning scripts for major architectures.
+
+The basic command structure is common to all architectures.
+
+```bash
+accelerate launch --mixed_precision bf16 {script_name}.py \
+ --pretrained_model_name_or_path \
+ --dataset_config \
+ --output_dir \
+ --output_name \
+ --save_model_as safetensors \
+ --max_train_steps 10000 \
+ --learning_rate 1e-5 \
+ --optimizer_type AdamW8bit
+```
+
+
+日本語
+
+### 各アーキテクチャのFine-tuning
+
+Fine-tuningはモデルの重み全体を更新するため、LoRA学習とは異なるオプションや考慮事項があります。ここでは主要なアーキテクチャごとのFine-tuningスクリプトについて説明します。
+
+基本的なコマンドの構造は、どのアーキテクチャでも共通です。
+
+```bash
+accelerate launch --mixed_precision bf16 {script_name}.py \
+ --pretrained_model_name_or_path \
+ --dataset_config \
+ --output_dir \
+ --output_name \
+ --save_model_as safetensors \
+ --max_train_steps 10000 \
+ --learning_rate 1e-5 \
+ --optimizer_type AdamW8bit
+```
+
+
+
+#### SDXL (`sdxl_train.py`)
+
+Performs fine-tuning for SDXL models. It is possible to train both the U-Net and the Text Encoders.
+
+**Key Options:**
+
+- `--train_text_encoder`: Includes the weights of the Text Encoders (CLIP ViT-L and OpenCLIP ViT-bigG) in the training. Effective for significant style changes or strongly learning specific concepts.
+- `--learning_rate_te1`, `--learning_rate_te2`: Set individual learning rates for each Text Encoder.
+- `--block_lr`: Divides the U-Net into 23 blocks and sets a different learning rate for each block. This allows for advanced adjustments, such as strengthening or weakening the learning of specific layers. (Not available in LoRA tuning).
+
+**Command Example:**
+
+```bash
+accelerate launch --mixed_precision bf16 sdxl_train.py \
+ --pretrained_model_name_or_path "sd_xl_base_1.0.safetensors" \
+ --dataset_config "dataset_config.toml" \
+ --output_dir "output" \
+ --output_name "sdxl_finetuned" \
+ --train_text_encoder \
+ --learning_rate 1e-5 \
+ --learning_rate_te1 5e-6 \
+ --learning_rate_te2 2e-6
+```
+
+
+日本語
+
+#### SDXL (`sdxl_train.py`)
+
+SDXLモデルのFine-tuningを行います。U-NetとText Encoderの両方を学習させることが可能です。
+
+**主要なオプション:**
+
+- `--train_text_encoder`: Text Encoder(CLIP ViT-LとOpenCLIP ViT-bigG)の重みを学習対象に含めます。画風を大きく変えたい場合や、特定の概念を強く学習させたい場合に有効です。
+- `--learning_rate_te1`, `--learning_rate_te2`: それぞれのText Encoderに個別の学習率を設定します。
+- `--block_lr`: U-Netを23個のブロックに分割し、ブロックごとに異なる学習率を設定できます。特定の層の学習を強めたり弱めたりする高度な調整が可能です。(LoRA学習では利用できません)
+
+**コマンド例:**
+
+```bash
+accelerate launch --mixed_precision bf16 sdxl_train.py \
+ --pretrained_model_name_or_path "sd_xl_base_1.0.safetensors" \
+ --dataset_config "dataset_config.toml" \
+ --output_dir "output" \
+ --output_name "sdxl_finetuned" \
+ --train_text_encoder \
+ --learning_rate 1e-5 \
+ --learning_rate_te1 5e-6 \
+ --learning_rate_te2 2e-6
+```
+
+
+
+#### SD3 (`sd3_train.py`)
+
+Performs fine-tuning for Stable Diffusion 3 Medium models. SD3 consists of three Text Encoders (CLIP-L, CLIP-G, T5-XXL) and a MMDiT (equivalent to U-Net), which can be targeted for training.
+
+**Key Options:**
+
+- `--train_text_encoder`: Enables training for CLIP-L and CLIP-G.
+- `--train_t5xxl`: Enables training for T5-XXL. T5-XXL is a very large model and requires a lot of VRAM for training.
+- `--blocks_to_swap`: A memory optimization feature to reduce VRAM usage. It swaps some blocks of the MMDiT to CPU memory during training. Useful for using larger batch sizes in low VRAM environments. (Also available in LoRA tuning).
+- `--num_last_block_to_freeze`: Freezes the weights of the last N blocks of the MMDiT, excluding them from training. Useful for maintaining model stability while focusing on learning in the lower layers.
+
+**Command Example:**
+
+```bash
+accelerate launch --mixed_precision bf16 sd3_train.py \
+ --pretrained_model_name_or_path "sd3_medium.safetensors" \
+ --dataset_config "dataset_config.toml" \
+ --output_dir "output" \
+ --output_name "sd3_finetuned" \
+ --train_text_encoder \
+ --learning_rate 4e-6 \
+ --blocks_to_swap 10
+```
+
+
+日本語
+
+#### SD3 (`sd3_train.py`)
+
+Stable Diffusion 3 MediumモデルのFine-tuningを行います。SD3は3つのText Encoder(CLIP-L, CLIP-G, T5-XXL)とMMDiT(U-Netに相当)で構成されており、これらを学習対象にできます。
+
+**主要なオプション:**
+
+- `--train_text_encoder`: CLIP-LとCLIP-Gの学習を有効にします。
+- `--train_t5xxl`: T5-XXLの学習を有効にします。T5-XXLは非常に大きなモデルのため、学習には多くのVRAMが必要です。
+- `--blocks_to_swap`: VRAM使用量を削減するためのメモリ最適化機能です。MMDiTの一部のブロックを学習中にCPUメモリに退避(スワップ)させます。VRAMが少ない環境で大きなバッチサイズを使いたい場合に有効です。(LoRA学習でも利用可能)
+- `--num_last_block_to_freeze`: MMDiTの最後のNブロックの重みを凍結し、学習対象から除外します。モデルの安定性を保ちつつ、下位層を中心に学習させたい場合に有効です。
+
+**コマンド例:**
+
+```bash
+accelerate launch --mixed_precision bf16 sd3_train.py \
+ --pretrained_model_name_or_path "sd3_medium.safetensors" \
+ --dataset_config "dataset_config.toml" \
+ --output_dir "output" \
+ --output_name "sd3_finetuned" \
+ --train_text_encoder \
+ --learning_rate 4e-6 \
+ --blocks_to_swap 10
+```
+
+
+
+#### FLUX.1 (`flux_train.py`)
+
+Performs fine-tuning for FLUX.1 models. FLUX.1 is internally composed of two Transformer blocks (Double Blocks, Single Blocks).
+
+**Key Options:**
+
+- `--blocks_to_swap`: Similar to SD3, this feature swaps Transformer blocks to the CPU for memory optimization.
+- `--blockwise_fused_optimizers`: An experimental feature that aims to streamline training by applying individual optimizers to each block.
+
+**Command Example:**
+
+```bash
+accelerate launch --mixed_precision bf16 flux_train.py \
+ --pretrained_model_name_or_path "FLUX.1-dev.safetensors" \
+ --dataset_config "dataset_config.toml" \
+ --output_dir "output" \
+ --output_name "flux1_finetuned" \
+ --learning_rate 1e-5 \
+ --blocks_to_swap 18
+```
+
+
+日本語
+
+#### FLUX.1 (`flux_train.py`)
+
+FLUX.1モデルのFine-tuningを行います。FLUX.1は内部的に2つのTransformerブロック(Double Blocks, Single Blocks)で構成されています。
+
+**主要なオプション:**
+
+- `--blocks_to_swap`: SD3と同様に、メモリ最適化のためにTransformerブロックをCPUにスワップする機能です。
+- `--blockwise_fused_optimizers`: 実験的な機能で、各ブロックに個別のオプティマイザを適用し、学習を効率化することを目指します。
+
+**コマンド例:**
+
+```bash
+accelerate launch --mixed_precision bf16 flux_train.py \
+ --pretrained_model_name_or_path "FLUX.1-dev.safetensors" \
+ --dataset_config "dataset_config.toml" \
+ --output_dir "output" \
+ --output_name "flux1_finetuned" \
+ --learning_rate 1e-5 \
+ --blocks_to_swap 18
+```
+
+
+
+#### Lumina (`lumina_train.py`)
+
+Performs fine-tuning for Lumina-Next DiT models.
+
+**Key Options:**
+
+- `--use_flash_attn`: Enables Flash Attention to speed up computation.
+- `lumina_train.py` is relatively new, and many of its options are shared with other scripts. Training can be performed following the basic command pattern.
+
+**Command Example:**
+
+```bash
+accelerate launch --mixed_precision bf16 lumina_train.py \
+ --pretrained_model_name_or_path "Lumina-Next-DiT-B.safetensors" \
+ --dataset_config "dataset_config.toml" \
+ --output_dir "output" \
+ --output_name "lumina_finetuned" \
+ --learning_rate 1e-5
+```
+
+
+日本語
+
+#### Lumina (`lumina_train.py`)
+
+Lumina-Next DiTモデルのFine-tuningを行います。
+
+**主要なオプション:**
+
+- `--use_flash_attn`: Flash Attentionを有効にし、計算を高速化します。
+- `lumina_train.py`は比較的新しく、オプションは他のスクリプトと共通化されている部分が多いです。基本的なコマンドパターンに従って学習を行えます。
+
+**コマンド例:**
+
+```bash
+accelerate launch --mixed_precision bf16 lumina_train.py \
+ --pretrained_model_name_or_path "Lumina-Next-DiT-B.safetensors" \
+ --dataset_config "dataset_config.toml" \
+ --output_dir "output" \
+ --output_name "lumina_finetuned" \
+ --learning_rate 1e-5
+```
+
+
+
+---
+
+### Differences between Fine-tuning and LoRA tuning per architecture
+
+| Architecture | Key Features/Options Specific to Fine-tuning | Main Differences from LoRA tuning |
+|:---|:---|:---|
+| **SDXL** | `--block_lr` | Only fine-tuning allows for granular control over the learning rate for each U-Net block. |
+| **SD3** | `--train_text_encoder`, `--train_t5xxl`, `--num_last_block_to_freeze` | Only fine-tuning can train the entire Text Encoders. LoRA only trains the adapter parts. |
+| **FLUX.1** | `--blockwise_fused_optimizers` | Since fine-tuning updates the entire model's weights, more experimental optimizer options are available. |
+| **Lumina** | (Few specific options) | Basic training options are common, but fine-tuning differs in that it updates the entire model's foundation. |
+
+
+日本語
+
+### アーキテクチャごとのFine-tuningとLoRA学習の違い
+
+| アーキテクチャ | Fine-tuning特有の主要機能・オプション | LoRA学習との主な違い |
+|:---|:---|:---|
+| **SDXL** | `--block_lr` | U-Netのブロックごとに学習率を細かく制御できるのはFine-tuningのみです。 |
+| **SD3** | `--train_text_encoder`, `--train_t5xxl`, `--num_last_block_to_freeze` | Text Encoder全体を学習対象にできるのはFine-tuningです。LoRAではアダプター部分のみ学習します。 |
+| **FLUX.1** | `--blockwise_fused_optimizers` | Fine-tuningではモデル全体の重みを更新するため、より実験的なオプティマイザの選択肢が用意されています。 |
+| **Lumina** | (特有のオプションは少ない) | 基本的な学習オプションは共通ですが、Fine-tuningはモデルの基盤全体を更新する点で異なります。 |
+
+
diff --git a/docs/flux_train_network.md b/docs/flux_train_network.md
new file mode 100644
index 000000000..b8207cb00
--- /dev/null
+++ b/docs/flux_train_network.md
@@ -0,0 +1,709 @@
+Status: reviewed
+
+# LoRA Training Guide for FLUX.1 using `flux_train_network.py` / `flux_train_network.py` を用いたFLUX.1モデルのLoRA学習ガイド
+
+This document explains how to train LoRA models for the FLUX.1 model using `flux_train_network.py` included in the `sd-scripts` repository.
+
+
+日本語
+
+このドキュメントでは、`sd-scripts`リポジトリに含まれる`flux_train_network.py`を使用して、FLUX.1モデルに対するLoRA (Low-Rank Adaptation) モデルを学習する基本的な手順について解説します。
+
+
+
+## 1. Introduction / はじめに
+
+`flux_train_network.py` trains additional networks such as LoRA on the FLUX.1 model, which uses a transformer-based architecture different from Stable Diffusion. Two text encoders, CLIP-L and T5-XXL, and a dedicated AutoEncoder are used.
+
+This guide assumes you know the basics of LoRA training. For common options see [train_network.py](train_network.md) and [sdxl_train_network.py](sdxl_train_network.md).
+
+**Prerequisites:**
+
+* The repository is cloned and the Python environment is ready.
+* A training dataset is prepared. See the dataset configuration guide.
+
+
+日本語
+
+`flux_train_network.py`は、FLUX.1モデルに対してLoRAなどの追加ネットワークを学習させるためのスクリプトです。FLUX.1はStable Diffusionとは異なるアーキテクチャを持つ画像生成モデルであり、このスクリプトを使用することで、特定のキャラクターや画風を再現するLoRAモデルを作成できます。
+
+このガイドは、基本的なLoRA学習の手順を理解しているユーザーを対象としています。基本的な使い方や共通のオプションについては、[`train_network.py`のガイド](train_network.md)を参照してください。また一部のパラメータは [`sdxl_train_network.py`](sdxl_train_network.md) と同様のものがあるため、そちらも参考にしてください。
+
+**前提条件:**
+
+* `sd-scripts`リポジトリのクローンとPython環境のセットアップが完了していること。
+* 学習用データセットの準備が完了していること。(データセットの準備については[データセット設定ガイド](link/to/dataset/config/doc)を参照してください)
+
+
+
+## 2. Differences from `train_network.py` / `train_network.py` との違い
+
+`flux_train_network.py` is based on `train_network.py` but adapted for FLUX.1. Main differences include:
+
+* **Target model:** FLUX.1 model (dev or schnell version).
+* **Model structure:** Unlike Stable Diffusion, FLUX.1 uses a Transformer-based architecture with two text encoders (CLIP-L and T5-XXL) and a dedicated AutoEncoder (AE) instead of VAE.
+* **Required arguments:** Additional arguments for FLUX.1 model, CLIP-L, T5-XXL, and AE model files.
+* **Incompatible options:** Some Stable Diffusion-specific arguments (e.g., `--v2`, `--clip_skip`, `--max_token_length`) are not used in FLUX.1 training.
+* **FLUX.1-specific arguments:** Additional arguments for FLUX.1-specific training parameters like timestep sampling and guidance scale.
+
+
+日本語
+
+`flux_train_network.py`は`train_network.py`をベースに、FLUX.1モデルに対応するための変更が加えられています。主な違いは以下の通りです。
+
+* **対象モデル:** FLUX.1モデル(dev版またはschnell版)を対象とします。
+* **モデル構造:** Stable Diffusionとは異なり、FLUX.1はTransformerベースのアーキテクチャを持ちます。Text EncoderとしてCLIP-LとT5-XXLの二つを使用し、VAEの代わりに専用のAutoEncoder (AE) を使用します。
+* **必須の引数:** FLUX.1モデル、CLIP-L、T5-XXL、AEの各モデルファイルを指定する引数が追加されています。
+* **一部引数の非互換性:** Stable Diffusion向けの引数の一部(例: `--v2`, `--clip_skip`, `--max_token_length`)はFLUX.1の学習では使用されません。
+* **FLUX.1特有の引数:** タイムステップのサンプリング方法やガイダンススケールなど、FLUX.1特有の学習パラメータを指定する引数が追加されています。
+
+
+
+## 3. Preparation / 準備
+
+Before starting training you need:
+
+1. **Training script:** `flux_train_network.py`
+2. **FLUX.1 model file:** Base FLUX.1 model `.safetensors` file (e.g., `flux1-dev.safetensors`).
+3. **Text Encoder model files:**
+ - CLIP-L model `.safetensors` file (e.g., `clip_l.safetensors`)
+ - T5-XXL model `.safetensors` file (e.g., `t5xxl.safetensors`)
+4. **AutoEncoder model file:** FLUX.1-compatible AE model `.safetensors` file (e.g., `ae.safetensors`).
+5. **Dataset definition file (.toml):** TOML format file describing training dataset configuration (e.g., `my_flux_dataset_config.toml`).
+
+### Downloading Required Models
+
+To train FLUX.1 models, you need to download the following model files:
+
+- **DiT, AE**: Download from the [black-forest-labs/FLUX.1 dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) repository. Use `flux1-dev.safetensors` and `ae.safetensors`. The weights in the subfolder are in Diffusers format and cannot be used.
+- **Text Encoder 1 (T5-XXL), Text Encoder 2 (CLIP-L)**: Download from the [ComfyUI FLUX Text Encoders](https://huggingface.co/comfyanonymous/flux_text_encoders) repository. Please use `t5xxl_fp16.safetensors` for T5-XXL. Thanks to ComfyUI for providing these models.
+
+To train Chroma models, you need to download the Chroma model file from the following repository:
+
+- **Chroma Base**: Download from the [lodestones/Chroma1-Base](https://huggingface.co/lodestones/Chroma1-Base) repository. Use `Chroma.safetensors`.
+
+We have tested Chroma training with the weights from the [lodestones/Chroma](https://huggingface.co/lodestones/Chroma) repository.
+
+AE and T5-XXL models are same as FLUX.1, so you can use the same files. CLIP-L model is not used for Chroma training, so you can omit the `--clip_l` argument.
+
+
+日本語
+
+学習を開始する前に、以下のファイルが必要です。
+
+1. **学習スクリプト:** `flux_train_network.py`
+2. **FLUX.1モデルファイル:** 学習のベースとなるFLUX.1モデルの`.safetensors`ファイル(例: `flux1-dev.safetensors`)。
+3. **Text Encoderモデルファイル:**
+ - CLIP-Lモデルの`.safetensors`ファイル。例として`clip_l.safetensors`を使用します。
+ - T5-XXLモデルの`.safetensors`ファイル。例として`t5xxl.safetensors`を使用します。
+4. **AutoEncoderモデルファイル:** FLUX.1に対応するAEモデルの`.safetensors`ファイル。例として`ae.safetensors`を使用します。
+5. **データセット定義ファイル (.toml):** 学習データセットの設定を記述したTOML形式のファイル。(詳細は[データセット設定ガイド](link/to/dataset/config/doc)を参照してください)。例として`my_flux_dataset_config.toml`を使用します。
+
+**必要なモデルのダウンロード**
+
+FLUX.1モデルを学習するためには、以下のモデルファイルをダウンロードする必要があります。
+
+- **DiT, AE**: [black-forest-labs/FLUX.1 dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) リポジトリからダウンロードします。`flux1-dev.safetensors`と`ae.safetensors`を使用してください。サブフォルダ内の重みはDiffusers形式であり、使用できません。
+- **Text Encoder 1 (T5-XXL), Text Encoder 2 (CLIP-L)**: [ComfyUI FLUX Text Encoders](https://huggingface.co/comfyanonymous/flux_text_encoders) リポジトリからダウンロードします。T5-XXLには`t5xxl_fp16.safetensors`を使用してください。これらのモデルを提供いただいたComfyUIに感謝します。
+
+Chromaモデルを学習する場合は、以下のリポジトリからChromaモデルファイルをダウンロードする必要があります。
+
+- **Chroma Base**: [lodestones/Chroma1-Base](https://huggingface.co/lodestones/Chroma1-Base) リポジトリからダウンロードします。`Chroma.safetensors`を使用してください。
+
+Chromaの学習のテストは [lodestones/Chroma](https://huggingface.co/lodestones/Chroma) リポジトリの重みを使用して行いました。
+
+AEとT5-XXLモデルはFLUX.1と同じものを使用できるため、同じファイルを使用します。CLIP-LモデルはChroma学習では使用されないため、`--clip_l`引数は省略できます。
+
+
+
+## 4. Running the Training / 学習の実行
+
+Run `flux_train_network.py` from the terminal with FLUX.1 specific arguments. Here's a basic command example:
+
+```bash
+accelerate launch --num_cpu_threads_per_process 1 flux_train_network.py \
+ --pretrained_model_name_or_path="" \
+ --clip_l="" \
+ --t5xxl="" \
+ --ae="" \
+ --dataset_config="my_flux_dataset_config.toml" \
+ --output_dir="