Consistent text encoder percentages when saving intermediate checkpoints #451

jx1k · 2022-11-10T11:54:29Z

jx1k
Nov 10, 2022

To preface this, I'm unsure, if this is just something I'm intersted in or if others would see this as useful.

When saving intermediate checkpoints the text encoder percentage gets thrown off, because it is calculated from the total number of steps. So earlier steps are going to have a higher percentage that gets lower with every intermediate step, until the target percentage is reached.
This makes it harder to compare checkpoints with at different numbers of samples, because there are effectively two factors at play. Having only the number of samples increase and the percentage stay the same would make it a lot easier to find a sweetspot for a certain set of training data in one go.
To achieve this each intermediate save would also have to do the specified text encoder steps. As an example, if we're training a model for 3.000 steps and save every 500 steps from 2.000 on with 40% text encoding, model 1 would have 2.000 steps with 800 steps encoding, model 2 would do another 500 steps with 200 encoding (to a total of 2.500/1.000 steps) and finally model 3 would go to 3.000 steps with 1.200 encoding.
I know the saving of the steps is handled outside the notebook in "train_dreambooth.py", but I think it could be implemented as a resume loop with the "txtenc_train" and "unet_train" functions. Of course this could also be done by hand, using the resume function, but automating it would be more comfortable and (in my opinion at least) the desired standard behaviour. I'm also pretty sure, that the time difference to training all text encoder steps in one go is negligable.
As a side note, I dont know how this relates to the contains_faces method, because I'm currently not using it (Had bad results).

Anyways, just putting that out there. I can understand if people think that things are fine just the way they are.

TheLastBen · 2022-11-10T14:35:35Z

TheLastBen
Nov 10, 2022
Maintainer

The text encoder's finetuning effect-curve is steeper than the UNet's finetuning effect-curve, once the text_encoder reaches a certain training threshold, it enters in the overfitting zone (which can have its advantages). So if you want to experiment and find the best steps count for the text encoder, it's better to compare full training sessions, 3000s and 10% text_enc, 3000s and 20%, 3000s and 30% ..... etc

Intermediary checkpoints are best saved after the text_encoder reaches its finetuning, given that you already experimented and know the best value for its training.

0 replies

jx1k · 2022-11-10T15:43:17Z

jx1k
Nov 10, 2022
Author

So is there a difference between an intermediate checkpoint that trained for 3000 steps with 10% (300 steps) of text encoding and a final checkpoint with the exact same amount of steps? I always thought of these as an accumulative process.
I get that I have to run multiple training sessions if I want to tune the text encoding percentage, but if I want to compare different step counts with the same percentage of text encoding (e.g. 2000 at 10%, 2500 at 10% 3000 at 10% and so on), running several resumes with a fixed percentage seemed logical.
You obviously know a lot more than me about the inner workings of Dreambooth, so there might be some error in my logic here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consistent text encoder percentages when saving intermediate checkpoints #451

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Consistent text encoder percentages when saving intermediate checkpoints #451

Uh oh!

jx1k Nov 10, 2022

Replies: 2 comments

Uh oh!

Uh oh!

TheLastBen Nov 10, 2022 Maintainer

Uh oh!

jx1k Nov 10, 2022 Author

jx1k
Nov 10, 2022

TheLastBen
Nov 10, 2022
Maintainer

jx1k
Nov 10, 2022
Author