Ablation study, reproducibility and determinism

## Context

The current model with the following configuration produces very promising results. 
Overall Configuration (not exhaustive) : 
* region based training with minimal number of classes
* multichannel with phase + mag 
* initial LR = 0.001
* LR scheduler = Cosine 
* Optimizer = AdamW 
* Mag preprocessing : light clahe
* Phase preprocessing : heavy contrast enhancement  
* Augmentation : light spatial augmentation 

Most of those hyperparameters were chosen based on intuition and personal experience. We'd like to make ablation studies to understand which hyperparameters really improve the results. We would also like to investigate new possible source of improvement such as (not exhaustive) : 
* multichannel with mag & phase **and** with adjacent slices 
* stronger augmentations
* stronger/weaker/no preprocessing

## Goal 

To make it possible to draw conclusions from experiments with different hyperparameters, we would like to make the results as reproducible as possible. The best way to do that is to enable deterministic training. 

## Issue 

nnUnet does not currently support deterministic training, see [issue 1423 of the nnunet repo](https://github.com/MIC-DKFZ/nnUNet/issues/1423). And no evolution is planned regarding that matter. 

## Current investigation

I added the basics that should make the training deterministic, but it's not yet. 
```python 
seed=42
random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
np.random.seed(seed)
torch.manual_seed(seed)
# I'm on mps so no CUDA-specific seeding is needed
torch.use_deterministic_algorithms(True)
```

I then noticed that even the dataloaders were not deterministic, so changed [this code](https://github.com/MIC-DKFZ/nnUNet/blob/f1851fbaf2c53dcb51b079b60a01de528a7d0c17/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py#L675), to this : 

```python 
        if allowed_num_processes == 0:
            mt_gen_train = SingleThreadedAugmenter(dl_tr, None)
            mt_gen_val = SingleThreadedAugmenter(dl_val, None)
        else:
            train_seeds = [MASTER_SEED + i for i in range(allowed_num_processes)]
            
            num_val_processes = max(1, allowed_num_processes // 2)
            val_seeds = [MASTER_SEED + 1000 + i for i in range(num_val_processes)] # Use an offset to avoid overlap

            mt_gen_train = MultiThreadedAugmenter(data_loader=dl_tr, transform=None,
                                                num_processes=allowed_num_processes,
                                                num_cached_per_queue=max(6, allowed_num_processes // 2), 
                                                seeds=train_seeds,
                                                pin_memory=self.device.type == 'cuda', wait_time=0.002)
            
            mt_gen_val = MultiThreadedAugmenter(data_loader=dl_val, transform=None, 
                                                num_processes=num_val_processes,
                                                num_cached_per_queue=max(3, allowed_num_processes // 4),
                                                seeds=val_seeds,
                                                pin_memory=self.device.type == 'cuda',
                                                wait_time=0.002)
```

Now the dataloaders are deterministic, but the training still isn't. 

## Next step

* Remove augmentation, this may be the source of non determinism 

## Possible workaround 

I'm currently training 10 models (6/10 right now) on 40 epochs to have an estimate of the variability of the loss and metrics during training. If we cannot make nnunet deterministic, we can use this to have a better understanding of the impact of each hyperparameters. _Note : this will involves training (on 40 epochs) several times for each hyperparameters modification, this is very time consuming and not ideal_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ablation study, reproducibility and determinism #23

Context

Goal

Issue

Current investigation

Next step

Possible workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ablation study, reproducibility and determinism #23

Description

Context

Goal

Issue

Current investigation

Next step

Possible workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions