Skip to content

Conversation

@rockerBOO
Copy link
Contributor

Subsets could not get a separate validation split but it is supported. Might also want to add validation_seed but I have not tested it.

@kohya-ss
Copy link
Owner

kohya-ss commented Sep 8, 2025

# We split the dataset for the subset based on if we are doing a validation split
# The self.is_training_dataset defines the type of dataset, training or validation
# if self.is_training_dataset is True -> training dataset
# if self.is_training_dataset is False -> validation dataset
if self.validation_split > 0.0:
# For regularization images we do not want to split this dataset.
if subset.is_reg is True:
# Skip any validation dataset for regularization images
if self.is_training_dataset is False:
img_paths = []
sizes = []
# Otherwise the img_paths remain as original img_paths and no split
# required for training images dataset of regularization images
else:
img_paths, sizes = split_train_val(
img_paths, sizes, self.is_training_dataset, self.validation_split, self.validation_seed
)

This part references dataset.validation_split and is_training_dataset, so it looks like it needs some additional changes to work...

@rockerBOO
Copy link
Contributor Author

Sorry, I had assumed I had this working. But it is not, as you point out. Making it work will be a bit more work. I updated the documentation in #2196 for how to do this with the current system.

@rockerBOO rockerBOO closed this Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants