Skip to content

Conversation

@FurkanGozukara
Copy link
Contributor

I did few tests and all cases worked

fixing this bug #678

Replace glob-based pattern matching with explicit pattern matching in
ImageDirectoryDatasource to avoid incorrect matches when image filenames
share common prefixes (e.g., image1.jpg incorrectly matching image10.jpg).

This fixes issue kohya-ss#678 where glob patterns like 'image1*.*' would match
any file starting with 'image1', causing incorrect control image associations.

The fix uses explicit checks for:
1. Exact basename matches (image1.jpg -> image1.png)
2. Basename followed by underscore and numeric digits (image1_0.jpg, image1_1.jpg)

This ensures image1.jpg only matches image1_0.jpg and image1_1.jpg,
not image10_0.jpg, image11_0.jpg, etc.
… process files, not directories, when matching control images.
@kohya-ss
Copy link
Owner

kohya-ss commented Oct 25, 2025

Thank you for this! However, I think it would be sufficient to simply change the following line

potential_paths = glob.glob(os.path.join(self.control_directory, os.path.splitext(image_basename)[0] + "*.*"))

to the following (sure, we can use image_basename_no_ext here):

potential_paths = glob.glob(os.path.join(self.control_directory, image_basename_no_ext + ".*"))  # exact match except extension
potential_paths += glob.glob(os.path.join(self.control_directory, image_basename_no_ext + "_*.*"))  # with suffix

These two lines of modification don't work well when the images are image_1.png and image_1_12.png, but the same seems to be true for the fix in this pull request.

Edit: We may need to consider a more robust matching mechanism.

@FurkanGozukara
Copy link
Contributor Author

Thank you for this! However, I think it would be sufficient to simply change the following line

potential_paths = glob.glob(os.path.join(self.control_directory, os.path.splitext(image_basename)[0] + "*.*"))

to the following (sure, we can use image_basename_no_ext here):

potential_paths = glob.glob(os.path.join(self.control_directory, image_basename_no_ext + ".*"))  # exact match except extension
potential_paths += glob.glob(os.path.join(self.control_directory, image_basename_no_ext + "_*.*"))  # with suffix

These two lines of modification don't work well when the images are image_1.png and image_1_12.png, but the same seems to be true for the fix in this pull request.

Edit: We may need to consider a more robust matching mechanism.

I am ok with anyway but this really needs urgent fixing

People trying to train and this causes huge vram and performance issues atm

@kohya-ss
Copy link
Owner

I addressed this issue in #684 and merged it. It should work fine in edge cases.

I will close this PR but thank you for raising this issue.

@kohya-ss kohya-ss closed this Oct 25, 2025
@FurkanGozukara
Copy link
Contributor Author

I addressed this issue in #684 and merged it. It should work fine in edge cases.

I will close this PR but thank you for raising this issue.

awesome thank you so much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants