SD3 finetune error #1382

juneleung · 2025-04-09T14:41:22Z

juneleung
Apr 9, 2025

Hi, I faced some issues when training SD3.

1: SD3 can't cache VAE cause self.transform_sample is None:

 File "/vol/SimpleTuner/helpers/caching/v
ae.py", line 833, in _process_images_in_batch                                 
    pixel_values = self.transform_sample(image).to(
TypeError: 'NoneType' object is not callable

I tried to add a func named get_transforms in helpers.models.common.py, which is copy from VideoModelFoundation class to ImageModelFoundation class,. I don't know if this is the right solution. It seems to solve the VAE cache problem, but what happens next doesn’t seem right.
func:

    def get_transforms(self):
        from torchvision import transforms

        return transforms.Compose(
            [
                transforms.ToTensor(),
            ]
        )

After added get_transforms, training can start. I set TRAINING_NUM_PROCESSES=4 to run on my 4 x A40 45G GPUs, but OOM failed. Does finetune SD3 need such a large memory? It doesn't seem right.

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 68.00 MiB. GPU 2 has a total capacity of 44.45 GiB of which 28.62 MiB is free. Including non-PyTorch memory, this process has 44.41 GiB memory in use. Of the allocated memory 42.86 GiB is allocated by PyTorch, and 1.11 GiB is reserved by PyTorch but unallocated. If reseroved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In addition, SD3's VAE uses SDXL's, is this correct?

Great thanks for your work!

config.json
{
"--resume_from_checkpoint": "latest",
"--data_backend_config": "config/multidatabackend.json",
"--aspect_bucket_rounding": 2,
"--seed": 42,
"--minimum_image_size": 0,
"--disable_benchmark": false,
"--output_dir": "output/models",
"--max_train_steps": 10000,
"--num_train_epochs": 0,
"--checkpointing_steps": 500,
"--checkpoints_total_limit": 5,
"--attention_mechanism": "diffusers",
"--tracker_project_name": "full-training",
"--tracker_run_name": "simpletuner-full",
"--report_to": "tensorboard",
"--model_type": "full",
"--pretrained_model_name_or_path": "stabilityai/stable-diffusion-3.5-large",
"--model_family": "sd3",
"--train_batch_size": 1,
"--gradient_checkpointing": "true",
"--caption_dropout_probability": 0.1,
"--resolution_type": "pixel_area",
"--resolution": 256,
"--validation_seed": 42,
"--validation_steps": 500,
"--validation_resolution": "256x256",
"--validation_guidance": 5.0,
"--validation_guidance_rescale": "0.0",
"--validation_num_inference_steps": "20",
"--validation_prompt": "A walking dog.",
"--mixed_precision": "bf16",
"--optimizer": "adamw_bf16",
"--learning_rate": "1e-6",
"--lr_scheduler": "polynomial",
"--lr_warmup_steps": 100,
"--base_model_precision": "no_change",
"--validation_torch_compile": "false"
}

multidatabackend.json
[
{
"id": "text-embed-cache",
"dataset_type": "text_embeds",
"default": true,
"type": "local",
"cache_dir": "/vol/SimpleTuner/cache/sd3/text",
"write_batch_size": 128
},
{
"id": "fac-1024",
"type": "local",
"instance_data_dir": "/home/vol/dataset/fac/image_set",
"crop": false,
"resolution_type": "pixel_area",
"metadata_backend": "discovery",
"caption_strategy": "textfile",
"cache_dir_vae": "/vol/SimpleTuner/cache/sd3/vae/1024",
"resolution": 256,
"minimum_image_size": 224,
"repeats": 1
},
{
"id": "fac-crop-1024",
"type": "local",
"instance_data_dir": "/home/vol/dataset/fac/image_set",
"crop": true,
"crop_aspect": "square",
"crop_style": "center",
"vae_cache_clear_each_epoch": false,
"resolution_type": "pixel_area",
"metadata_backend": "discovery",
"caption_strategy": "textfile",
"cache_dir_vae": "/vol/SimpleTuner/cache/sd3/vae-crop/1024",
"resolution": 256,
"minimum_image_size": 224,
"repeats": 1
}
]

bghira · 2025-04-09T14:43:31Z

bghira
Apr 9, 2025
Maintainer

you need to update to the latest git main, i believe. that problem has been solved. but yeah, FFT across 48G GPUs likely needs DeepSpeed... it's an 8B parameter model! you're confusing it with the 2.5B parameter one.

0 replies

bghira · 2025-04-09T14:44:09Z

bghira
Apr 9, 2025
Maintainer

SD3 uses its own 16ch VAE.

please be sure to follow the SD3 quickstart.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SD3 finetune error #1382

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

SD3 finetune error #1382

Uh oh!

Uh oh!

juneleung Apr 9, 2025

Replies: 2 comments

Uh oh!

bghira Apr 9, 2025 Maintainer

Uh oh!

bghira Apr 9, 2025 Maintainer

juneleung
Apr 9, 2025

bghira
Apr 9, 2025
Maintainer

bghira
Apr 9, 2025
Maintainer