Training process stops unexpectedly with Return code: -9 #969
-
Hi everyone I am trying to segment breasts on MRI images. The dataset I'm using can be found here.
The full config file is given below:
First I trained this model by annotating some images using Slicer's Grow from seeds functionality, submiting the labels and running the training process. I kept annotating more images to serve as training data, and retraining the model.
The whole log file can be found as an attachment: deepedit_breast_logs.txt. There can be seen that this occured during the third epoch. When I run the training again, I always get the same outcome, but often during a different epoch (for example the first or the fifth). It strikes me as weird that, although nothing changed as opposed to the training that has been done before, I still get this unexpected result. Maybe there's something wrong with the 2 extra images I added to the training data? Does anyone have a clue as to how this can be dealt with? Thank you in advance! Kind regards Lukas Vander Stricht |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
@tangy5 can you help here... |
Beta Was this translation helpful? Give feedback.
-
Thanks for the detailed information, @lukasvdstricht. Are you using CacheDataset or SmartCacheDataset to train the model? What's the size of the RAM memory? I suspect there is not enough memory to Cache the dataset. You could either reduce the image size to (128, 128, 64) here or use Dataset to train the model. Please let us know. |
Beta Was this translation helpful? Give feedback.
-
Hi @lukasvdstricht , as @diazandr3s mentioned, is the program terminated automatically after a few epochs training? If so, it probably the system member problem. You could try reduce input image dimension, use lower cache rate or just use Dataset for training. |
Beta Was this translation helpful? Give feedback.
Thanks for the detailed information, @lukasvdstricht.
Are you using CacheDataset or SmartCacheDataset to train the model? What's the size of the RAM memory? I suspect there is not enough memory to Cache the dataset.
You could either reduce the image size to (128, 128, 64) here or use Dataset to train the model.
Please let us know.