`Dice Loss` vs `Dice Loss with smooting term`

This issue discusses differences in the implementation of the Dice Loss with and without the smoothing term.

## Background why opening this issue/discussion

tl;dr:
- nnunetv2 `nnUNetTrainerDiceCELoss_noSmooth` trainer (i.e., without the smoothing term of the Dice loss) helped the model from collapsing to zero during lesion model training.

<details><summary>Details</summary>

Since the default `nnUNetTrainer` trainer was collapsing to zero when training the DCM (degenerative cervical myelopathy) lesion segmentation model, we tried `nnUNetTrainerDiceCELoss_noSmooth` (i.e., without the smoothing term of the Dice loss). 
This trainer was discovered by @naga-karthik in these two nnunet threads ([1](https://github.com/MIC-DKFZ/nnUNet/issues/1395#issuecomment-1778621176), [2](https://github.com/MIC-DKFZ/nnUNet/issues/812)). The trainer indeed helped, and the model was no longer collapsing to zero; see details in [this issue](https://github.com/ivadomed/model-seg-dcm/issues/1#issuecomment-1930151543).

Note that DCM lesion segmentation presents a high-class imbalance (lesions are small objects).

</details>

## Comparison of the default and `nnUNetTrainerDiceCELoss_noSmooth` trainers

tl;dr: 
- the default `nnUNetTrainer` trainer uses `smooth: float = 1.`
- `nnUNetTrainerDiceCELoss_noSmooth` uses `'smooth': 0`

<details><summary>Details</summary>

### nnunetv2 default trainer

The nnunetv2 default trainer uses `MemoryEfficientSoftDiceLoss` (see [L352-L362](https://github.com/MIC-DKFZ/nnUNet/blob/997804c7510634dc8fd83f1194b434c60815a93e/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py#L352-L362) in [nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py](https://github.com/MIC-DKFZ/nnUNet/blob/997804c7510634dc8fd83f1194b434c60815a93e/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py)).

This `MemoryEfficientSoftDiceLoss` (see [L58](https://github.com/MIC-DKFZ/nnUNet/blob/997804c7510634dc8fd83f1194b434c60815a93e/nnunetv2/training/loss/dice.py#L58) in [nnunetv2/training/loss/dice.py](https://github.com/MIC-DKFZ/nnUNet/blob/997804c7510634dc8fd83f1194b434c60815a93e/nnunetv2/training/loss/dice.py)) uses **_both_** smoothing term (`self.smooth`) and small constant (`1e-8`); see [L116](https://github.com/MIC-DKFZ/nnUNet/blob/997804c7510634dc8fd83f1194b434c60815a93e/nnunetv2/training/loss/dice.py#L116):

```python
dc = (2 * intersect + self.smooth) / (torch.clip(sum_gt + sum_pred + self.smooth, 1e-8))
```

---

### nnunetv2 `nnUNetTrainerDiceCELoss_noSmooth` trainer

The nnunetv2 `nnUNetTrainerDiceCELoss_noSmooth` trainer (see [L32](https://github.com/MIC-DKFZ/nnUNet/blob/997804c7510634dc8fd83f1194b434c60815a93e/nnunetv2/training/nnUNetTrainer/variants/loss/nnUNetTrainerDiceLoss.py#L32) in [nnunetv2/training/nnUNetTrainer/variants/loss/nnUNetTrainerDiceLoss.py](https://github.com/MIC-DKFZ/nnUNet/blob/997804c7510634dc8fd83f1194b434c60815a93e/nnunetv2/training/nnUNetTrainer/variants/loss/nnUNetTrainerDiceLoss.py)) sets `smooth` to `0`. The small constant (1e-8) is apparently untouched and kept.

</details>

## What is the smoothing term used for?

tl;dr: hard to say convincingly.
- keras and ivadomed use **_only_** the smoothing term without the small constant
- nnunetv2 default trainer uses **_both_** the smoothing term and the small constant
- nnunetv2  `nnUNetTrainerDiceCELoss_noSmooth` trainer uses only the small constant (because the smoothing term is set to zero)

<details><summary>Details</summary>

Initially, I incorrectly thought that the nnunetv2 smoothing term was used to prevent division by zero. I got this sense based on [this comment](https://github.com/keras-team/keras/issues/3611#issuecomment-492294505). But, after a deeper look at the equation in this comment, I found out that the equation uses only the smoothing term but no small constant. Further investigation led me to these two discussions ([1](https://stackoverflow.com/questions/51973856/how-is-the-smooth-dice-loss-differentiable), [2](https://gist.github.com/wassname/7793e2058c5c9dacb5212c0ac0b18a8a)) about the Dice implementation in keras. Both discussions use only the smoothing term but again, but no small constant:

```python
score = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
```

Checking the ivadomed Dice implementation, and finding that it also uses only the smoothing term (see [L63](https://github.com/ivadomed/ivadomed/blob/881dc6804323c7ccfcd30968c8f8113cc86fcfb9/ivadomed/losses.py#L63) in [ivadomed/losses.py](https://github.com/ivadomed/ivadomed/blob/master/ivadomed/losses.py)):

```python
return - (2.0 * intersection + self.smooth) / (iflat.sum() + tflat.sum() + self.smooth)
```

I also found [this comment](https://github.com/ivadomed/ivadomed/issues/183) from Charley Gros providing the following explanation (note that this comment is related to the ivadomed Dice without the small constant):

> A very probable reason is the different way these two functions are dealing with empty GT and empty pred.
--> Dice loss returns 1 

Both keras and ivadomed implementations are in contrast with the nnunet implementation, which uses **_both_** smoothing term (`self.smooth`) and small constant (`1e-8`); see [L116](https://github.com/MIC-DKFZ/nnUNet/blob/997804c7510634dc8fd83f1194b434c60815a93e/nnunetv2/training/loss/dice.py#L116):

```python
dc = (2 * intersect + self.smooth) / (torch.clip(sum_gt + sum_pred + self.smooth, 1e-8))
```

Prompting chatGPT to explain why removing the smoothing term from the nnunet dice helped from collapsing to zero, provides a relatively reasonable explanation (highlighted in italics):

> The smoothing term, as used here, serves a dual purpose:
> 
> 1. Stabilization of the Dice Coefficient: By adding the smoothing term (self.smooth) to both the numerator (nominator) and the denominator, it ensures that even when there are no true positives (TP) (e.g., the model predicts no lesion where there is actually no lesion, or it fails to predict a lesion where there is one), the Dice coefficient does not become undefined or result in division by zero. The use of torch.clip with a minimum value of 1e-8 further ensures that the denominator cannot be zero, which prevents division by zero errors.
> 2. Influence on Gradient Behavior and Model Sensitivity: The smoothing term affects the gradient of the loss with respect to the model's predictions. When TP is very small or zero, the addition of the smoothing term to both the numerator and denominator ensures that the model still receives a gradient signal that encourages it to improve its predictions. It essentially softens the harshness of the Dice loss, making it more forgiving in the early stages of training or in cases where the model struggles to detect small or sparse lesions. This can lead to a more stable and gradual learning process.
>
> _In your specific case, removing the smoothing term with the nnUNetTrainerDiceCELoss_noSmooth trainer likely made the loss more directly reflective of the actual performance of the model in terms of the overlap between the predicted segmentation and the ground truth. For the task of lesion segmentation, where lesions can be small and the balance between lesion and non-lesion areas is crucial, the direct feedback without the smoothing term's moderation might have better aligned the loss with the task's objectives, thereby improving model performance._

</details>

---

Further investigation and experiments comparing the nnunet default `nnUNetTrainer` trainer and `nnUNetTrainerDiceCELoss_noSmooth` are in progress.

Tagging @naga-karthik and @plbenveniste, who both also work on lesion segmentation. If any of you had time to go through the investigation above to check if I didn't make any naive mistakes, it would be great. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`Dice Loss` vs `Dice Loss with smooting term` #42

Background why opening this issue/discussion

Comparison of the default and `nnUNetTrainerDiceCELoss_noSmooth` trainers

nnunetv2 default trainer

nnunetv2 `nnUNetTrainerDiceCELoss_noSmooth` trainer

What is the smoothing term used for?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dice Loss vs Dice Loss with smooting term #42

Description

Background why opening this issue/discussion

Comparison of the default and nnUNetTrainerDiceCELoss_noSmooth trainers

nnunetv2 default trainer

nnunetv2 nnUNetTrainerDiceCELoss_noSmooth trainer

What is the smoothing term used for?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`Dice Loss` vs `Dice Loss with smooting term` #42

Comparison of the default and `nnUNetTrainerDiceCELoss_noSmooth` trainers

nnunetv2 `nnUNetTrainerDiceCELoss_noSmooth` trainer