Replies: 2 comments 1 reply
-
|
The code to create the buckets is written here: musubi-tuner/src/musubi_tuner/dataset/image_video_dataset.py Lines 469 to 480 in fec404c This code will generate the following buckets when the resolution is 1024,1024 and reso_steps is 16 (common for all models).
Each image is resized and cropped to the resolution of the bucket with the closest aspect ratio. |
Beta Was this translation helpful? Give feedback.
-
|
@kohya-ss when user enables bucketing is this what is auto used? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to do a full fine tune of Qwen-Image, and am currently working on dataset preparation. I have a ~5 million image captioned dataset, which is mostly ~1MP images, but includes images with a wide variety of different dimensions (e.g. 768x1152, 1024x688, 856x1280, etc etc)
I want to pick a bunch of common aspect ratio buckets and group these pictures into them, and then crop them to fit into the nearest sized AR bucket (e.g. into one of (1, 1), (4, 3), (3, 4), (16, 9), (9, 16), (5, 4), (4, 5), (3, 2), (2, 3), (7, 5), (5, 7), etc) to minimize the amount of cropping that is occuring.
I am fine to write my own code to sort the images into AR buckets, and do the cropping myself, but I am just trying to figure out how to make musubi-tuner utilize these different buckets when training? What do I set under the "resolution" variable in the TOML file? Does musubi-tuner automatically handle AR bucketing somehow for diverse datasets like mine, or will I need to write custom bucketing/cropping code?
Can someone give me a detailed breakdown of how aspect ratio bucketing works in terms of training and dataset configuration? Thanks!
Beta Was this translation helpful? Give feedback.
All reactions