Fix training/validation metrics scaling and validation loss aggregation #1126
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes two issues in training and validation metrics logging that affect comparability and correctness.
Problems fixed
1. Training loss / metric scaled with batch size
In
SamTrainer._compute_loss(), per-samplemask_lossandiou_regression_losswere accumulated across the batch without normalization.As a result, increasing
batch_sizeinflated the reported loss and metric values even when model quality was unchanged, making runs hard to compare.2. Validation
mask_loss/iou_losswere not epoch averagesDuring validation,
validation/metricandvalidation/losswere averaged overlen(val_loader), butvalidation/mask_lossandvalidation/iou_losseffectively reflected only the last validation batch rather than epoch-level averages.Changes made
mask_lossandiou_regression_lossbybatch_sizeinside_compute_loss()so training metrics are batch-size invariant.validation/mask_lossandvalidation/iou_loss.Why this matters
This change does not affect model behavior or optimization—only how metrics are computed and reported.