-
Notifications
You must be signed in to change notification settings - Fork 901
Description
Hello!
I didn't change the code and use both model.fit() and eager_tf to train the network.
For model.fit() the avg validation loss value is < 50 even in the first epoch. And the training loss value also goes < 50 in the beginning of the second epoch.
For eager_tf the validation loss stays at ~ 200 after 10 epochs, and the training loss decreases much slower, and goes to ~50 in the 10th epoch, which looks like overfitting.
This is the training result for model.fit():
Epoch 1:
1/358
- loss: 9787.6289 - yolo_output_0_loss: 508.0005 - yolo_output_1_loss: 1342.9556 - yolo_output_2_loss: 7925.9561
...
357/358
- loss: 378.2877 - yolo_output_0_loss: 22.6362 - yolo_output_1_loss: 49.9713 - yolo_output_2_loss: 294.6154
358/358
- loss: 378.0025 - yolo_output_0_loss: 22.6236 - yolo_output_1_loss: 49.9357 - yolo_output_2_loss: 294.3785
val_loss: 51.9096 - val_yolo_output_0_loss: 8.8620 - val_yolo_output_1_loss: 7.8781 - val_yolo_output_2_loss: 24.0912
Epoch 2:
1/358
- loss: 43.6244 - yolo_output_0_loss: 6.2404 - yolo_output_1_loss: 8.0534 - yolo_output_2_loss: 18.2523
Notice this sudden transition of training loss from 378 to 43 - this is because model.fit() reports the average among all the iterations in one batch.
This is the training result for eager_tf:
1_train_0, 155262.8125, [5675.242, 34116.484, 115460.375]
...
1_train_356, 523.5953369140625, [124.26721, 100.35405, 287.8407]
1_train_357, 125.0768814086914, [25.127472, 11.3394575, 77.47637]
1_val_0, 565.5044555664062, [86.86941, 158.40671, 309.0946]
...
1_val_363, 694.1661987304688, [114.45209, 213.89682, 354.6836]
(Average) 1, train: 5050.33447265625, val: 590.8134155273438
2_train_0, 788.0953369140625, [132.88559, 241.86014, 402.21585]
2_train_1, 493.3677978515625, [86.920746, 157.22601, 238.08711]
Notice that here the losses are per-iteration losses and are not averaged.
ever since the first iteration, the loss values are much bigger than model.fit(), and at the end of epoch 1, the loss is >100, which is much worse compared with < 50 in model.fit().
I strictly follow the tutorial used for training and used the datasets / darknet model downloaded directly from the links provided.
I guess this might relate to the different process of loss functions.
Do you by any chance know why?