Skip to content

Bug: Re-Binarization of labels in pre-processing routine #18

@jqmcginnis

Description

@jqmcginnis

Hi all 🙂,

I just discovered this small bug / missing post-processing step in the pre-processing routine.

Discovery / Problem

When running nn_UNnet nnUNet_plan_and_preprocess -t 501 --verify_dataset_integrity on the pre-processed dataset, we get the following error message:

Unexpected labels found in file /home/jmcginnis/data/nnunet/nnUNet_raw_data_base/nnUNet_raw_data/Task501_MSSpineLesionPreprocessedAxialOnly/labelsTr/MSSpineLesionPreprocessedAxialOnly_047.nii.gz. 
Found these unexpected values (they should not be there)
[7.925189e-17, 8.585621e-17, 9.0809456e-17, 1.0401811e-16, 
1.0897135e-16, 1.386908e-16, 1.5024837e-16, ...]

Similarly, @kiristern is dealing with low DICE values for the Modified U-Net baseline:

2022-12-20 17:24:12.210 | INFO     |
ivadomed.testing:test:88 - {'dice_score': 0.060521258921826665, 
'multi_class_dice_score': 0.060521258921826665, 
'precision_score': 0.06500272972600003, 
'recall_score': 0.0716620061524084, 
'specificity_score': 0.9973268868391573, 
'intersection_over_union': 0.03410365979066674, 
'accuracy_score': 0.996024251185431, 'hausdorff_score': 2.045652891236524}

Although I am not familiar with ivadomed, I suspect that multi_class_dice_score indicates that ivadomed faces similar problems with the non-binary labels and interprets it as a multi-class problem instead.

... but why?

When we resample the images to isotropic resolution, we introduce sampling artifacts as we blur the edges of the labels, leading to smoothed contours. Thus, we observe values other than {0,1} in the labels. Can be easily debugged by looking at one of the many examples of labels in the dataset.

Solution

We can mitigate this effect by adding the following post-processing step after this line:

sct_resample -i ${file}_T2w_crop.nii.gz -mm 0.75x0.75x0.75 -o ${file}_T2w_crop_res.nii.gz

sct_maths -i ${file}_T2w_crop_res.nii.gz -bin 1e-12 -o ${file}_T2w_crop_res.nii.gz

I haven't looked into finding an optimal value for the threshold 1e-12, but I've chosen it to be extremely low so we have the whole label borders.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions