Replies: 4 comments 5 replies
-
|
Heya, I had previously had a look at this but I don't know /anything/ about validation loss. What exactly is it? AI said "Validation loss measures how well a machine learning model performs on a separate dataset that it hasn't seen during training." but I'm not quite sure what it's getting at. Just calculating loss for samples or something? Anyway nothing particularly stands out to me here but I have a lot less familiarity with the training parts of the code. I asked my ChatGPT o4-mini-high but that didn't produce anything helpful either. Sorry! |
Beta Was this translation helpful? Give feedback.
-
|
It basically does a forward pass with a validation image (not included in training) and measures how "well" the current model measures against it. How seeding with a fixed noise at a given timestep measures against what the model would give for those parameters. The thing is i'm new to this and python too.. so its a miracle it even works to any extent.. :) I'm testing on the 1.3b wan2.1 so i did not enable gradient checkpointing (3090) but yeah i figured that would be needed later. |
Beta Was this translation helpful? Give feedback.
-
|
This is the last run i did. 24 image dataset with a repeat of 2, so 48 samples per epoch. 3 validation images. Wan2.1 1.3b t2v. 5e-5 lr 64/1 rank/alpha. According to this it bottomed out at 20-25 epoch. (left graph is the mean val loss, the 3 on the right are for the separate images) |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
As the title says, i'm trying to fix up this old PR mostly for Wan2.1 / Skyreels. My first goal is to make it full functional and properly working, then fix up issues that the original had and add maybe progress bar, config options etc.
In its current state it seems to work, at least that is my general assumption. Val loss is calculated, reported and logged. The problem is that it does not really follow the expected curve and i don't know if that is because of Wan or my dataset/captions or the code has some flaws. It does follow a somewhat downwards trend but it is a lot more erratic.
So, here is the main def validate code part, i have also attached a zip with all the changed files if someone wants to try it. Be warned it only works with Wan2.1 / Skyreels (t2v/df) and you cannot turn it off :D you have to have a [val_datasets] in your toml
Any feedback is appreciated and if you can point out any flaws in the code / logic please do so.(i'm a noob at python)
Vall_loss.zip
Beta Was this translation helpful? Give feedback.
All reactions