Replies: 2 comments
-
you need to update to the latest git main, i believe. that problem has been solved. but yeah, FFT across 48G GPUs likely needs DeepSpeed... it's an 8B parameter model! you're confusing it with the 2.5B parameter one. |
Beta Was this translation helpful? Give feedback.
0 replies
-
SD3 uses its own 16ch VAE. please be sure to follow the SD3 quickstart. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I faced some issues when training SD3.
1: SD3 can't cache VAE cause self.transform_sample is None:
I tried to add a func named get_transforms in helpers.models.common.py, which is copy from VideoModelFoundation class to ImageModelFoundation class,. I don't know if this is the right solution. It seems to solve the VAE cache problem, but what happens next doesn’t seem right.
func:
Great thanks for your work!
config.json
{
"--resume_from_checkpoint": "latest",
"--data_backend_config": "config/multidatabackend.json",
"--aspect_bucket_rounding": 2,
"--seed": 42,
"--minimum_image_size": 0,
"--disable_benchmark": false,
"--output_dir": "output/models",
"--max_train_steps": 10000,
"--num_train_epochs": 0,
"--checkpointing_steps": 500,
"--checkpoints_total_limit": 5,
"--attention_mechanism": "diffusers",
"--tracker_project_name": "full-training",
"--tracker_run_name": "simpletuner-full",
"--report_to": "tensorboard",
"--model_type": "full",
"--pretrained_model_name_or_path": "stabilityai/stable-diffusion-3.5-large",
"--model_family": "sd3",
"--train_batch_size": 1,
"--gradient_checkpointing": "true",
"--caption_dropout_probability": 0.1,
"--resolution_type": "pixel_area",
"--resolution": 256,
"--validation_seed": 42,
"--validation_steps": 500,
"--validation_resolution": "256x256",
"--validation_guidance": 5.0,
"--validation_guidance_rescale": "0.0",
"--validation_num_inference_steps": "20",
"--validation_prompt": "A walking dog.",
"--mixed_precision": "bf16",
"--optimizer": "adamw_bf16",
"--learning_rate": "1e-6",
"--lr_scheduler": "polynomial",
"--lr_warmup_steps": 100,
"--base_model_precision": "no_change",
"--validation_torch_compile": "false"
}
multidatabackend.json
[
{
"id": "text-embed-cache",
"dataset_type": "text_embeds",
"default": true,
"type": "local",
"cache_dir": "/vol/SimpleTuner/cache/sd3/text",
"write_batch_size": 128
},
{
"id": "fac-1024",
"type": "local",
"instance_data_dir": "/home/vol/dataset/fac/image_set",
"crop": false,
"resolution_type": "pixel_area",
"metadata_backend": "discovery",
"caption_strategy": "textfile",
"cache_dir_vae": "/vol/SimpleTuner/cache/sd3/vae/1024",
"resolution": 256,
"minimum_image_size": 224,
"repeats": 1
},
{
"id": "fac-crop-1024",
"type": "local",
"instance_data_dir": "/home/vol/dataset/fac/image_set",
"crop": true,
"crop_aspect": "square",
"crop_style": "center",
"vae_cache_clear_each_epoch": false,
"resolution_type": "pixel_area",
"metadata_backend": "discovery",
"caption_strategy": "textfile",
"cache_dir_vae": "/vol/SimpleTuner/cache/sd3/vae-crop/1024",
"resolution": 256,
"minimum_image_size": 224,
"repeats": 1
}
]
Beta Was this translation helpful? Give feedback.
All reactions