Replies: 2 comments
-
|
The text encoder's finetuning effect-curve is steeper than the UNet's finetuning effect-curve, once the text_encoder reaches a certain training threshold, it enters in the overfitting zone (which can have its advantages). So if you want to experiment and find the best steps count for the text encoder, it's better to compare full training sessions, 3000s and 10% text_enc, 3000s and 20%, 3000s and 30% ..... etc Intermediary checkpoints are best saved after the text_encoder reaches its finetuning, given that you already experimented and know the best value for its training. |
Beta Was this translation helpful? Give feedback.
-
|
So is there a difference between an intermediate checkpoint that trained for 3000 steps with 10% (300 steps) of text encoding and a final checkpoint with the exact same amount of steps? I always thought of these as an accumulative process. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
To preface this, I'm unsure, if this is just something I'm intersted in or if others would see this as useful.
When saving intermediate checkpoints the text encoder percentage gets thrown off, because it is calculated from the total number of steps. So earlier steps are going to have a higher percentage that gets lower with every intermediate step, until the target percentage is reached.
This makes it harder to compare checkpoints with at different numbers of samples, because there are effectively two factors at play. Having only the number of samples increase and the percentage stay the same would make it a lot easier to find a sweetspot for a certain set of training data in one go.
To achieve this each intermediate save would also have to do the specified text encoder steps. As an example, if we're training a model for 3.000 steps and save every 500 steps from 2.000 on with 40% text encoding, model 1 would have 2.000 steps with 800 steps encoding, model 2 would do another 500 steps with 200 encoding (to a total of 2.500/1.000 steps) and finally model 3 would go to 3.000 steps with 1.200 encoding.
I know the saving of the steps is handled outside the notebook in "train_dreambooth.py", but I think it could be implemented as a resume loop with the "txtenc_train" and "unet_train" functions. Of course this could also be done by hand, using the resume function, but automating it would be more comfortable and (in my opinion at least) the desired standard behaviour. I'm also pretty sure, that the time difference to training all text encoder steps in one go is negligable.
As a side note, I dont know how this relates to the contains_faces method, because I'm currently not using it (Had bad results).
Anyways, just putting that out there. I can understand if people think that things are fine just the way they are.
Beta Was this translation helpful? Give feedback.
All reactions