Skip to content

Huge and sustained loss after resume wan 2.2 i2v training lora #667

@bazanga912

Description

@bazanga912

Hey everyone.
I'm training for the first time and I had to cancel the job. When I started it again I saw that the loss had a spike. So I read about it and it seemed that it would normalize in about 200-300 steps. But that's not what's happening.

Image

As you can see in blue I run it until the step 6645 and then I had to cancelled it with about 0.0091 loss. When I resumed it I did it with a save at the start of an epoch on the 6000's step. And how you can see after 1299 steps it only dropped to 0.0095 and this came after a huge spike that reached about 0.0104. Something is wrong, right? Don't know what. I have the --save_state in the command.

Can someone help a newbie?
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions