-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Hello,
First of all, I‘m greatful for the training code you provided! Recently ,I tried to replicate the ootd-dc or ootd-hd on the data searched from shopping website. But whenever the dc or hd model , the loss curve fluctuates repeatedly around 0.02 and the results were getting worse.

after 4 epochs:

We trained the model form the ootd-dc checkpoint with
1.mixed_precision float32.
2.Resolution 1024*768
3.batch_size 8 on multi-gpu
4. epochs=4 for expirement
5. Additional 15000 paired data
We tried to use fp16 for faster and bigger batch_size, but the debugger report "Attempting to unscale FP16 gradient" at "optimizer.step()"
Besides , I wonder what the normal loss curve looks like. And How many epochs should we set for replication If the fluctuation around 0.02 is normal.