-
Notifications
You must be signed in to change notification settings - Fork 0
Description
This issue discusses differences in the implementation of the Dice Loss with and without the smoothing term.
Background why opening this issue/discussion
tl;dr:
- nnunetv2
nnUNetTrainerDiceCELoss_noSmooth
trainer (i.e., without the smoothing term of the Dice loss) helped the model from collapsing to zero during lesion model training.
Details
Since the default nnUNetTrainer
trainer was collapsing to zero when training the DCM (degenerative cervical myelopathy) lesion segmentation model, we tried nnUNetTrainerDiceCELoss_noSmooth
(i.e., without the smoothing term of the Dice loss).
This trainer was discovered by @naga-karthik in these two nnunet threads (1, 2). The trainer indeed helped, and the model was no longer collapsing to zero; see details in this issue.
Note that DCM lesion segmentation presents a high-class imbalance (lesions are small objects).
Comparison of the default and nnUNetTrainerDiceCELoss_noSmooth
trainers
tl;dr:
- the default
nnUNetTrainer
trainer usessmooth: float = 1.
nnUNetTrainerDiceCELoss_noSmooth
uses'smooth': 0
Details
nnunetv2 default trainer
The nnunetv2 default trainer uses MemoryEfficientSoftDiceLoss
(see L352-L362 in nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py).
This MemoryEfficientSoftDiceLoss
(see L58 in nnunetv2/training/loss/dice.py) uses both smoothing term (self.smooth
) and small constant (1e-8
); see L116:
dc = (2 * intersect + self.smooth) / (torch.clip(sum_gt + sum_pred + self.smooth, 1e-8))
nnunetv2 nnUNetTrainerDiceCELoss_noSmooth
trainer
The nnunetv2 nnUNetTrainerDiceCELoss_noSmooth
trainer (see L32 in nnunetv2/training/nnUNetTrainer/variants/loss/nnUNetTrainerDiceLoss.py) sets smooth
to 0
. The small constant (1e-8) is apparently untouched and kept.
What is the smoothing term used for?
tl;dr: hard to say convincingly.
- keras and ivadomed use only the smoothing term without the small constant
- nnunetv2 default trainer uses both the smoothing term and the small constant
- nnunetv2
nnUNetTrainerDiceCELoss_noSmooth
trainer uses only the small constant (because the smoothing term is set to zero)
Details
Initially, I incorrectly thought that the nnunetv2 smoothing term was used to prevent division by zero. I got this sense based on this comment. But, after a deeper look at the equation in this comment, I found out that the equation uses only the smoothing term but no small constant. Further investigation led me to these two discussions (1, 2) about the Dice implementation in keras. Both discussions use only the smoothing term but again, but no small constant:
score = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
Checking the ivadomed Dice implementation, and finding that it also uses only the smoothing term (see L63 in ivadomed/losses.py):
return - (2.0 * intersection + self.smooth) / (iflat.sum() + tflat.sum() + self.smooth)
I also found this comment from Charley Gros providing the following explanation (note that this comment is related to the ivadomed Dice without the small constant):
A very probable reason is the different way these two functions are dealing with empty GT and empty pred.
--> Dice loss returns 1
Both keras and ivadomed implementations are in contrast with the nnunet implementation, which uses both smoothing term (self.smooth
) and small constant (1e-8
); see L116:
dc = (2 * intersect + self.smooth) / (torch.clip(sum_gt + sum_pred + self.smooth, 1e-8))
Prompting chatGPT to explain why removing the smoothing term from the nnunet dice helped from collapsing to zero, provides a relatively reasonable explanation (highlighted in italics):
The smoothing term, as used here, serves a dual purpose:
- Stabilization of the Dice Coefficient: By adding the smoothing term (self.smooth) to both the numerator (nominator) and the denominator, it ensures that even when there are no true positives (TP) (e.g., the model predicts no lesion where there is actually no lesion, or it fails to predict a lesion where there is one), the Dice coefficient does not become undefined or result in division by zero. The use of torch.clip with a minimum value of 1e-8 further ensures that the denominator cannot be zero, which prevents division by zero errors.
- Influence on Gradient Behavior and Model Sensitivity: The smoothing term affects the gradient of the loss with respect to the model's predictions. When TP is very small or zero, the addition of the smoothing term to both the numerator and denominator ensures that the model still receives a gradient signal that encourages it to improve its predictions. It essentially softens the harshness of the Dice loss, making it more forgiving in the early stages of training or in cases where the model struggles to detect small or sparse lesions. This can lead to a more stable and gradual learning process.
In your specific case, removing the smoothing term with the nnUNetTrainerDiceCELoss_noSmooth trainer likely made the loss more directly reflective of the actual performance of the model in terms of the overlap between the predicted segmentation and the ground truth. For the task of lesion segmentation, where lesions can be small and the balance between lesion and non-lesion areas is crucial, the direct feedback without the smoothing term's moderation might have better aligned the loss with the task's objectives, thereby improving model performance.
Further investigation and experiments comparing the nnunet default nnUNetTrainer
trainer and nnUNetTrainerDiceCELoss_noSmooth
are in progress.
Tagging @naga-karthik and @plbenveniste, who both also work on lesion segmentation. If any of you had time to go through the investigation above to check if I didn't make any naive mistakes, it would be great.