Monailabel error with Cuda and SegResNet. #1636
Unanswered
keyurradia
asked this question in
Q&A
Replies: 1 comment
-
Hi @keyurradia, Thanks for opening this discussion. How many labels are you trying to segment? It seems this error comes from the difference in the number of labels when using the pre-trained model. Have you disabled pre-trained model usage, here: If not, please delete all files within the radiology/model folder and disable the pre-trained model usage before triggering the training. Let us know, |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am trying to train segmentation model with custom labels. I made changes in the config of model according to instruction.
I am having 16gb GPU. I have installed cuda and pytorch according to compatibility.
The error is erupting and disabling "cuda"
and the other error is "RuntimeError: Error(s) in loading state_dict for SegResNet"
Here is my code in conda. can some please help me where I am doing wrong?
[2024-02-09 10:34:49,391] [30420] [MainThread] [INFO] (monailabel.endpoints.activelearning:44) - Active Learning Request: {'strategy': 'first', 'client_id': 'user-xyz'}
[2024-02-09 10:34:49,392] [30420] [MainThread] [INFO] (monailabel.tasks.activelearning.first:38) - First: Selected Image: hepaticvessel_004
[2024-02-09 10:34:49,428] [30420] [MainThread] [INFO] (monailabel.endpoints.activelearning:60) - Next sample: {'id': 'hepaticvessel_004', 'path': 'C:\Users\keyur\datasets\Task08_HepaticVessel\imagesTr\hepaticvessel_004.nii.gz', 'ts': 1707328026, 'name': 'hepaticvessel_004.nii.gz'}
[2024-02-09 10:36:49,455] [30420] [MainThread] [INFO] (monailabel.endpoints.datastore:101) - Saving Label for hepaticvessel_004 for tag: final by admin
[2024-02-09 10:36:49,458] [30420] [MainThread] [INFO] (monailabel.endpoints.datastore:112) - Save Label params: {"label_info": [{"name": "gallbladder", "idx": 1}, {"name": "liver", "idx": 2}, {"name": "inferior vena cava", "idx": 3}, {"name": "portal vein and splenic vein", "idx": 4}, {"name": "vessels", "idx": 5}, {"name": "lesion", "idx": 6}], "client_id": "user-xyz"}
[2024-02-09 10:36:49,459] [30420] [MainThread] [INFO] (monailabel.datastore.local:486) - Saving Label for Image: hepaticvessel_004; Tag: final; Info: {'label_info': [{'name': 'gallbladder', 'idx': 1}, {'name': 'liver', 'idx': 2}, {'name': 'inferior vena cava', 'idx': 3}, {'name': 'portal vein and splenic vein', 'idx': 4}, {'name': 'vessels', 'idx': 5}, {'name': 'lesion', 'idx': 6}], 'client_id': 'user-xyz'}
[2024-02-09 10:36:49,460] [30420] [MainThread] [INFO] (monailabel.datastore.local:494) - Adding Label: hepaticvessel_004 => final => C:\Users\keyur\AppData\Local\Temp\tmpmuo47wzs.nii.gz
[2024-02-09 10:36:49,468] [30420] [MainThread] [INFO] (monailabel.datastore.local:510) - Label Info: {'label_info': [{'name': 'gallbladder', 'idx': 1}, {'name': 'liver', 'idx': 2}, {'name': 'inferior vena cava', 'idx': 3}, {'name': 'portal vein and splenic vein', 'idx': 4}, {'name': 'vessels', 'idx': 5}, {'name': 'lesion', 'idx': 6}], 'client_id': 'user-xyz', 'ts': 1707471409, 'name': 'hepaticvessel_004.nii.gz'}
[2024-02-09 10:36:49,491] [30420] [MainThread] [INFO] (monailabel.interfaces.app:493) - New label saved for: hepaticvessel_004 => hepaticvessel_004
[2024-02-09 10:36:49,748] [30420] [Thread-1] [INFO] (monailabel.datastore.local:577) - Invalidate count: 0
[2024-02-09 10:37:04,944] [30420] [MainThread] [INFO] (monailabel.utils.async_tasks.task:41) - Train request: {'model': 'segmentation', 'name': 'train_01', 'pretrained': True, 'device': 'cpu', 'max_epochs': 50, 'early_stop_patience': -1, 'val_split': 0.2, 'train_batch_size': 1, 'val_batch_size': 1, 'multi_gpu': True, 'gpus': 'all', 'dataset': 'SmartCacheDataset', 'dataloader': 'ThreadDataLoader', 'tracking': 'mlflow', 'tracking_uri': '', 'tracking_experiment_name': '', 'client_id': 'user-xyz'}
[2024-02-09 10:37:04,946] [30420] [ThreadPoolExecutor-1_0] [INFO] (monailabel.utils.async_tasks.utils:49) - Before:: C:\Users\keyur\anaconda3\envs;;C:\Users\keyur\apps\radiology
[2024-02-09 10:37:04,947] [30420] [ThreadPoolExecutor-1_0] [INFO] (monailabel.utils.async_tasks.utils:53) - After:: C:\Users\keyur\anaconda3\envs;;C:\Users\keyur\apps\radiology
[2024-02-09 10:37:04,947] [30420] [ThreadPoolExecutor-1_0] [INFO] (monailabel.utils.async_tasks.utils:65) - COMMAND:: C:\Users\keyur\anaconda3\envs\monailabel-env\python.exe -m monailabel.interfaces.utils.app -m train -r {"model":"segmentation","name":"train_01","pretrained":true,"device":"cpu","max_epochs":50,"early_stop_patience":-1,"val_split":0.2,"train_batch_size":1,"val_batch_size":1,"multi_gpu":true,"gpus":"all","dataset":"SmartCacheDataset","dataloader":"ThreadDataLoader","tracking":"mlflow","tracking_uri":"","tracking_experiment_name":"","client_id":"user-xyz"}
[2024-02-09 10:37:05,218] [37016] [MainThread] [INFO] (main:37) - Initializing App from: C:\Users\keyur\apps\radiology; studies: C:\Users\keyur\datasets\Task08_HepaticVessel\imagesTr; conf: {'models': 'segmentation'}
[2024-02-09 10:37:09,348] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for MONAILabelApp Found: <class 'main.MyApp'>
[2024-02-09 10:37:09,354] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.deepedit.DeepEdit'>
[2024-02-09 10:37:09,355] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.deepgrow_2d.Deepgrow2D'>
[2024-02-09 10:37:09,355] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.deepgrow_3d.Deepgrow3D'>
[2024-02-09 10:37:09,355] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.localization_spine.LocalizationSpine'>
[2024-02-09 10:37:09,356] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.localization_vertebra.LocalizationVertebra'>
[2024-02-09 10:37:09,356] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.segmentation.Segmentation'>
[2024-02-09 10:37:09,357] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.segmentation_spleen.SegmentationSpleen'>
[2024-02-09 10:37:09,357] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.segmentation_vertebra.SegmentationVertebra'>
[2024-02-09 10:37:09,357] [37016] [MainThread] [INFO] (main:93) - +++ Adding Model: segmentation => lib.configs.segmentation.Segmentation
[2024-02-09 10:37:09,414] [37016] [MainThread] [INFO] (main:96) - +++ Using Models: ['segmentation']
[2024-02-09 10:37:09,414] [37016] [MainThread] [INFO] (monailabel.interfaces.app:135) - Init Datastore for: C:\Users\keyur\datasets\Task08_HepaticVessel\imagesTr
[2024-02-09 10:37:09,414] [37016] [MainThread] [INFO] (monailabel.datastore.local:130) - Auto Reload: False; Extensions: ['.nii.gz', '.nii', '.nrrd', '.jpg', '.png', '.tif', '.svs', '.xml']
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (monailabel.datastore.local:577) - Invalidate count: 0
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (main:126) - +++ Adding Inferer:: segmentation => <lib.infers.segmentation.Segmentation object at 0x0000015FB794B1D0>
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (main:191) - {'segmentation': <lib.infers.segmentation.Segmentation object at 0x0000015FB794B1D0>, 'Histogram+GraphCut': <monailabel.scribbles.infer.HistogramBasedGraphCut object at 0x0000015FB7BCFA90>, 'GMM+GraphCut': <monailabel.scribbles.infer.GMMBasedGraphCut object at 0x0000015FB7B94250>}
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (main:206) - +++ Adding Trainer:: segmentation => <lib.trainers.segmentation.Segmentation object at 0x0000015FB7B95E10>
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (monailabel.utils.sessions:51) - Session Path: C:\Users\keyur.cache\monailabel\sessions
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (monailabel.utils.sessions:52) - Session Expiry (max): 3600
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:432) - Train Request (input): {'model': 'segmentation', 'name': 'train_01', 'pretrained': True, 'device': 'cpu', 'max_epochs': 50, 'early_stop_patience': -1, 'val_split': 0.2, 'train_batch_size': 1, 'val_batch_size': 1, 'multi_gpu': True, 'gpus': 'all', 'dataset': 'SmartCacheDataset', 'dataloader': 'ThreadDataLoader', 'tracking': 'mlflow', 'tracking_uri': '', 'tracking_experiment_name': '', 'client_id': 'user-xyz', 'local_rank': 0}
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:445) - CUDA_VISIBLE_DEVICES: None
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:450) - Distributed/Multi GPU is limited
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:465) - Distributed Training = FALSE
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:492) - 0 - Train Request (final): {'name': 'train_01', 'pretrained': True, 'device': 'cpu', 'max_epochs': 50, 'early_stop_patience': -1, 'val_split': 0.2, 'train_batch_size': 1, 'val_batch_size': 1, 'multi_gpu': False, 'gpus': 'all', 'dataset': 'SmartCacheDataset', 'dataloader': 'ThreadDataLoader', 'tracking': 'mlflow', 'tracking_uri': '', 'tracking_experiment_name': '', 'model': 'segmentation', 'client_id': 'user-xyz', 'local_rank': 0, 'run_id': '20240209_103709'}
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:625) - 0 - Using Device: cpu; IDX: None
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:518) - Run/Output Path: C:\Users\keyur\apps\radiology\model\segmentation\train_01
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:534) - Tracking: mlflow
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:535) - Tracking URI: file:///C:/Users/keyur/apps/radiology/model/segmentation/train_01/mlruns;
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:536) - Tracking Experiment Name: segmentation; Run Name: run_20240209_103709
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:410) - Total Records for Training: 2
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:411) - Total Records for Validation: 1
monai.transforms.croppad.dictionary CropForegroundd.init:allow_smaller: Current default value of argument
allow_smaller=True
has been deprecated since version 1.2. It will be changed toallow_smaller=False
in version 1.5.Loading dataset: 0%| | 0/1 [00:00<?, ?it/s]
Loading dataset: 100%|##########| 1/1 [00:01<00:00, 1.85s/it]
Loading dataset: 100%|##########| 1/1 [00:01<00:00, 1.85s/it]
cache_num is greater or equal than dataset length, fall back to regular monai.data.CacheDataset.
[2024-02-09 10:37:11,341] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:328) - 0 - Records for Validation: 1
[2024-02-09 10:37:11,345] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:318) - 0 - Adding Validation to run every '1' interval
[2024-02-09 10:37:11,347] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:713) - 0 - Load Path C:\Users\keyur\apps\radiology\model\segmentation\train_01\model.pt
Loading dataset: 0%| | 0/2 [00:00<?, ?it/s]
Loading dataset: 50%|##### | 1/2 [00:02<00:02, 2.34s/it]
Loading dataset: 100%|##########| 2/2 [00:04<00:00, 2.24s/it]
Loading dataset: 100%|##########| 2/2 [00:04<00:00, 2.26s/it]
[2024-02-09 10:37:15,868] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:264) - 0 - Records for Training: 2
torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
[2024-02-09 10:37:15,872] [37016] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:876) - Engine run resuming from iteration 0, epoch 0 until 50 epochs
[2024-02-09 10:37:15,938] [37016] [MainThread] [ERROR] (ignite.engine.engine.SupervisedTrainer:992) - Engine run is terminating due to exception: Error(s) in loading state_dict for SegResNet:
size mismatch for conv_final.2.conv.weight: copying a param with shape torch.Size([3, 32, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([7, 32, 1, 1, 1]).
size mismatch for conv_final.2.conv.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([7]).
2024-02-09 10:37:15,938 - ERROR - Exception: Error(s) in loading state_dict for SegResNet:
size mismatch for conv_final.2.conv.weight: copying a param with shape torch.Size([3, 32, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([7, 32, 1, 1, 1]).
size mismatch for conv_final.2.conv.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([7]).
Traceback (most recent call last):
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 946, in _internal_run_as_gen
self._fire_event(Events.STARTED)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event
func(*first, *(event_args + others), **kwargs)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monai\handlers\checkpoint_loader.py", line 147, in call
Checkpoint.load_objects(to_load=self.load_dict, checkpoint=checkpoint, strict=self.strict)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\handlers\checkpoint.py", line 635, in load_objects
_load_object(obj, checkpoint_obj[k])
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\handlers\checkpoint.py", line 620, in _load_object
obj.load_state_dict(chkpt_obj, strict=is_state_dict_strict)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\torch\nn\modules\module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SegResNet:
size mismatch for conv_final.2.conv.weight: copying a param with shape torch.Size([3, 32, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([7, 32, 1, 1, 1]).
size mismatch for conv_final.2.conv.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([7]).
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monailabel\interfaces\utils\app.py", line 128, in
run_main()
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monailabel\interfaces\utils\app.py", line 113, in run_main
result = a.train(request)
^^^^^^^^^^^^^^^^
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monailabel\interfaces\app.py", line 423, in train
result = task(request, self.datastore())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monailabel\tasks\train\basic_train.py", line 466, in call
res = self.train(0, world_size, req, datalist)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monailabel\tasks\train\basic_train.py", line 555, in train
context.trainer.run()
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monai\engines\trainer.py", line 53, in run
super().run()
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monai\engines\workflow.py", line 283, in run
super().run(data=self.data_loader, max_epochs=self.state.max_epochs)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 892, in run
return self._internal_run()
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 935, in _internal_run
return next(self._internal_run_generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 993, in _internal_run_as_gen
self._handle_exception(e)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 636, in _handle_exception
self._fire_event(Events.EXCEPTION_RAISED, e)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event
func(*first, *(event_args + others), **kwargs)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monai\handlers\stats_handler.py", line 202, in exception_raised raise e
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 946, in _internal_run_as_gen
self._fire_event(Events.STARTED)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event
func(*first, *(event_args + others), **kwargs)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monai\handlers\checkpoint_loader.py", line 147, in call
Checkpoint.load_objects(to_load=self.load_dict, checkpoint=checkpoint, strict=self.strict)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\handlers\checkpoint.py", line 635, in load_objects
_load_object(obj, checkpoint_obj[k])
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\handlers\checkpoint.py", line 620, in _load_object
obj.load_state_dict(chkpt_obj, strict=is_state_dict_strict)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\torch\nn\modules\module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SegResNet:
size mismatch for conv_final.2.conv.weight: copying a param with shape torch.Size([3, 32, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([7, 32, 1, 1, 1]).
size mismatch for conv_final.2.conv.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([7]).
[2024-02-09 10:37:16,472] [30420] [ThreadPoolExecutor-1_0] [INFO] (monailabel.utils.async_tasks.utils:83) - Return code: 1
Beta Was this translation helpful? Give feedback.
All reactions