Skip to content

Dice Loss vs Dice Loss with smooting term #42

@valosekj

Description

@valosekj

This issue discusses differences in the implementation of the Dice Loss with and without the smoothing term.

Background why opening this issue/discussion

tl;dr:

  • nnunetv2 nnUNetTrainerDiceCELoss_noSmooth trainer (i.e., without the smoothing term of the Dice loss) helped the model from collapsing to zero during lesion model training.
Details

Since the default nnUNetTrainer trainer was collapsing to zero when training the DCM (degenerative cervical myelopathy) lesion segmentation model, we tried nnUNetTrainerDiceCELoss_noSmooth (i.e., without the smoothing term of the Dice loss).
This trainer was discovered by @naga-karthik in these two nnunet threads (1, 2). The trainer indeed helped, and the model was no longer collapsing to zero; see details in this issue.

Note that DCM lesion segmentation presents a high-class imbalance (lesions are small objects).

Comparison of the default and nnUNetTrainerDiceCELoss_noSmooth trainers

tl;dr:

  • the default nnUNetTrainer trainer uses smooth: float = 1.
  • nnUNetTrainerDiceCELoss_noSmooth uses 'smooth': 0
Details

nnunetv2 default trainer

The nnunetv2 default trainer uses MemoryEfficientSoftDiceLoss (see L352-L362 in nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py).

This MemoryEfficientSoftDiceLoss (see L58 in nnunetv2/training/loss/dice.py) uses both smoothing term (self.smooth) and small constant (1e-8); see L116:

dc = (2 * intersect + self.smooth) / (torch.clip(sum_gt + sum_pred + self.smooth, 1e-8))

nnunetv2 nnUNetTrainerDiceCELoss_noSmooth trainer

The nnunetv2 nnUNetTrainerDiceCELoss_noSmooth trainer (see L32 in nnunetv2/training/nnUNetTrainer/variants/loss/nnUNetTrainerDiceLoss.py) sets smooth to 0. The small constant (1e-8) is apparently untouched and kept.

What is the smoothing term used for?

tl;dr: hard to say convincingly.

  • keras and ivadomed use only the smoothing term without the small constant
  • nnunetv2 default trainer uses both the smoothing term and the small constant
  • nnunetv2 nnUNetTrainerDiceCELoss_noSmooth trainer uses only the small constant (because the smoothing term is set to zero)
Details

Initially, I incorrectly thought that the nnunetv2 smoothing term was used to prevent division by zero. I got this sense based on this comment. But, after a deeper look at the equation in this comment, I found out that the equation uses only the smoothing term but no small constant. Further investigation led me to these two discussions (1, 2) about the Dice implementation in keras. Both discussions use only the smoothing term but again, but no small constant:

score = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)

Checking the ivadomed Dice implementation, and finding that it also uses only the smoothing term (see L63 in ivadomed/losses.py):

return - (2.0 * intersection + self.smooth) / (iflat.sum() + tflat.sum() + self.smooth)

I also found this comment from Charley Gros providing the following explanation (note that this comment is related to the ivadomed Dice without the small constant):

A very probable reason is the different way these two functions are dealing with empty GT and empty pred.
--> Dice loss returns 1

Both keras and ivadomed implementations are in contrast with the nnunet implementation, which uses both smoothing term (self.smooth) and small constant (1e-8); see L116:

dc = (2 * intersect + self.smooth) / (torch.clip(sum_gt + sum_pred + self.smooth, 1e-8))

Prompting chatGPT to explain why removing the smoothing term from the nnunet dice helped from collapsing to zero, provides a relatively reasonable explanation (highlighted in italics):

The smoothing term, as used here, serves a dual purpose:

  1. Stabilization of the Dice Coefficient: By adding the smoothing term (self.smooth) to both the numerator (nominator) and the denominator, it ensures that even when there are no true positives (TP) (e.g., the model predicts no lesion where there is actually no lesion, or it fails to predict a lesion where there is one), the Dice coefficient does not become undefined or result in division by zero. The use of torch.clip with a minimum value of 1e-8 further ensures that the denominator cannot be zero, which prevents division by zero errors.
  2. Influence on Gradient Behavior and Model Sensitivity: The smoothing term affects the gradient of the loss with respect to the model's predictions. When TP is very small or zero, the addition of the smoothing term to both the numerator and denominator ensures that the model still receives a gradient signal that encourages it to improve its predictions. It essentially softens the harshness of the Dice loss, making it more forgiving in the early stages of training or in cases where the model struggles to detect small or sparse lesions. This can lead to a more stable and gradual learning process.

In your specific case, removing the smoothing term with the nnUNetTrainerDiceCELoss_noSmooth trainer likely made the loss more directly reflective of the actual performance of the model in terms of the overlap between the predicted segmentation and the ground truth. For the task of lesion segmentation, where lesions can be small and the balance between lesion and non-lesion areas is crucial, the direct feedback without the smoothing term's moderation might have better aligned the loss with the task's objectives, thereby improving model performance.


Further investigation and experiments comparing the nnunet default nnUNetTrainer trainer and nnUNetTrainerDiceCELoss_noSmooth are in progress.

Tagging @naga-karthik and @plbenveniste, who both also work on lesion segmentation. If any of you had time to go through the investigation above to check if I didn't make any naive mistakes, it would be great.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions