Qwen Image Lora not capturing enough details #523

brandon-ykw · 2025-09-02T19:08:29Z

brandon-ykw
Sep 2, 2025

Hello, I'm trying to create Loras furniture. For example, a cabinet.

I am able to generally get the lora to capture the cabinet's structure well. But, the details of the cabinet is sometimes wrong. For example, the grain of the wood could be wrong.

My hypothesis is that I need to train more on the later timesteps (focused more on details), but I can't seem to get it working. I even tried the qinglong_qwen timestep_sampling option that supposed to train more on details but that didn't work.

Would appreciate any tips

# Training configuration for Musubi-Tuner Qwen-Image LoRA

[model_params]
dit = "Qwen/Qwen-Image"
vae = "Qwen/Qwen-Image" 
text_encoder = "Qwen/Qwen-Image"

[network_params]
network_module = "networks.lora_qwen_image"
network_dim = 128
network_alpha = 128

[optimizer_params]
optimizer_type = "adamw"
lr_scheduler = "cosine_with_restarts"
lr_scheduler_num_cycles = 3

[training_params]
output_dir = "garment/lower/lower_2-4/output" 
output_name = "lora"
max_train_steps = 1800
learning_rate = 5e-5
mixed_precision = "bf16"
timestep_sampling = "qinglong_qwen"
logit_mean = -6.0
logit_std = 2.0
weighting_scheme = "none"
save_every_n_steps = 100
sdpa = true 
gradient_checkpointing = true

[sample_params]
sample_every_n_steps = 100
sample_prompts = "garment/lower/lower_2-4/sample_prompts.txt"

FurkanGozukara · 2025-09-02T19:29:42Z

FurkanGozukara
Sep 2, 2025
Sponsor

i have done 28 unique trainings so far on same dataset on BF16 and true it is underperforming compared to FLUX

i am doing the hardest training, training yourself

i am also trying to obtain excellent realism

still doing research. currently rented 5x RTX 6000 PRO - 2.74 second / it

I was expecting this model to surpass FLUX in every case but so far results are discouraging

0 replies

kohya-ss · 2025-09-02T22:23:15Z

kohya-ss
Sep 2, 2025
Maintainer

It depends heavily on the dataset, so it's hard to say for sure.

How many training images are there and are they consistent? Also, what are the captions like and are the captions for generation similar to them?

Also, dim=128 seems too high, and personally, I always set alpha to 1 to simplify learning rate estimation. And you might want to increase the number of steps a bit.

I think you could start with simple settings, such as lr_scheduler=constant,dim=16, alpha=1, learning_rate=1e-4~1e-3. Also, try setting timestep_sampling to shift and discrete_flow_shift to 2.2, or timestep_sampling to uniform.

0 replies

brandon-ykw · 2025-09-03T02:13:49Z

brandon-ykw
Sep 3, 2025
Author

I found that cosine with 3 restarts worked well for my use-case before but let me try simplifying my settings to see how that works.

Also, my understanding of alpha was that it controlled the "strength" of the training. So, if I set alpha to 1, would that mean that it will require many more steps to train?

Please forgive me if I'm wrong but, since I want to capture the "details" of the image, would decreasing rank be counter productive?

To further clarify: I have 3 images taken of the furniture from different angles in a studio setting.

The captions are like:
trigger_word table in a room with white walls

The generation is like:
trigger_word table in a room

I found that qwen lora training is able to capture the shape well. But it misses out on small details that when I trained on flux it won't miss. (similar to what Furkan says)

3 replies

kohya-ss Sep 3, 2025
Maintainer

Even when you lower alpha, the number of steps will not increase if you increase the learning rate.

I think three training images is too few. It would be better to prepare around 30 images if possible.

And rank 128 is probably too high to reproduce the details of furniture. It is the number of parameters that can reproduce all pixels of the three training images. I think diversity is actually reduced.

Also, if the details are lost, try using timestep_sampling and discrete_shift as mentioned in the previous comment.

brandon-ykw Sep 3, 2025
Author

I tried shift with values of 2.2 (to bias high timestep) and value of 0.5 (to bias low timestep). Both seem to not be able to capture the fine details of the surface. I also tried reducing rank to 16 and 64 with similar results. Also tried sigma_sqrt weight strategy but inference still lacked details.

Looking around, it seems like qwen-image is known to have realism limitations because it creates soft/blurry images.

See comment on image sharpness/detail by qwen developers:
QwenLM/Qwen-Image#51 (comment)

related anecdotes from reddit:
https://www.reddit.com/r/StableDiffusion/comments/1mjdh7l/my_thoughts_on_qwen_great_model_but_lacks_realism/
https://www.reddit.com/r/StableDiffusion/comments/1midb89/qwen_image_outputs_blurry_halftone_pattern/

My guess is this is a limitation of the model itself (at inference) rather than the lora training process...

kohya-ss Sep 3, 2025
Maintainer

Thank you for the valuable information. It seems that Qwen-Image itself has some weaknesses. As you say, it may be difficult to improve it in the LoRA training process.

If there is a problem with inference, I think there may be some improvement if you reduce the discrete shift during inference and increase the number of steps.

FurkanGozukara · 2025-09-04T22:10:07Z

FurkanGozukara
Sep 4, 2025
Sponsor

here few my top quality so far still doing R&D

0 replies

Oruli · 2025-09-06T13:09:35Z

Oruli
Sep 6, 2025

I gave up, whatever I tried skin just comes up smooth, cannot get the detail into it.

0 replies

Auryg · 2025-09-10T16:26:13Z

Auryg
Sep 10, 2025

Be sure to test your resulting loras at something like 50 steps - that helps a lot with details.

I've found timestep_sampling set to sigmoid is good for most things - you can go uniform for the last bits of training to really bake in details but I wouldn't recommend it the whole time. Train at 640x640 and 1328x1328 buckets.

0 replies

Uh oh!

Qwen Image Lora not capturing enough details #523

Uh oh!

brandon-ykw Sep 2, 2025

Replies: 6 comments · 3 replies

Uh oh!

Uh oh!

FurkanGozukara Sep 2, 2025 Sponsor

Uh oh!

kohya-ss Sep 2, 2025 Maintainer

Uh oh!

brandon-ykw Sep 3, 2025 Author

Uh oh!

kohya-ss Sep 3, 2025 Maintainer

Uh oh!

Uh oh!

brandon-ykw Sep 3, 2025 Author

Uh oh!

kohya-ss Sep 3, 2025 Maintainer

Uh oh!

FurkanGozukara Sep 4, 2025 Sponsor

Uh oh!

Oruli Sep 6, 2025

Uh oh!

Auryg Sep 10, 2025

brandon-ykw
Sep 2, 2025

Replies: 6 comments 3 replies

FurkanGozukara
Sep 2, 2025
Sponsor

kohya-ss
Sep 2, 2025
Maintainer

brandon-ykw
Sep 3, 2025
Author

kohya-ss Sep 3, 2025
Maintainer

brandon-ykw Sep 3, 2025
Author

kohya-ss Sep 3, 2025
Maintainer

FurkanGozukara
Sep 4, 2025
Sponsor

Oruli
Sep 6, 2025

Auryg
Sep 10, 2025