Skip to content

error while running auto3dseg_hello_world.ipynb on colab #1964

Open
@rbramkumar

Description

@rbramkumar

Getting an error while running the Hello World example (running as is from the tutorial). I tried changing num_fold=3, %env CUDA_VISIBLE_DEVICES to 0 or 1; but getting the same error

https://github.com/Project-MONAI/tutorials/blob/main/auto3dseg/notebooks/auto3dseg_hello_world.ipynb



runner = AutoRunner(
    work_dir=work_dir,
    input={
        "modality": "MRI",
        "datalist": datalist_file,
        "dataroot": dataroot_dir,
    },
)

%env CUDA_VISIBLE_DEVICES=1

max_epochs = 2

train_param = {
    "num_epochs_per_validation": 1,
    "num_images_per_batch": 2,
    "num_epochs": max_epochs,
    "num_warmup_epochs": 1,
}
runner.set_training_params(train_param)
runner.set_num_fold(num_fold=3)

runner.run()

MONAI version: 1.5.dev2512
Numpy version: 2.0.2
Pytorch version: 2.6.0+cu124
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: e4701e24c97d1f8c7ba40777c238cdfe14b04581
MONAI file: /usr/local/lib/python3.11/dist-packages/monai/init.py

Optional dependencies:
Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION.
ITK version: NOT INSTALLED or UNKNOWN VERSION.
Nibabel version: 5.3.2
scikit-image version: 0.25.2
scipy version: 1.14.1
Pillow version: 11.1.0
Tensorboard version: 2.18.0
gdown version: 5.2.0
TorchVision version: 0.21.0+cu124
tqdm version: 4.67.1
lmdb version: NOT INSTALLED or UNKNOWN VERSION.
psutil version: 5.9.5
pandas version: 2.2.2
einops version: 0.8.1
transformers version: 4.50.0
mlflow version: NOT INSTALLED or UNKNOWN VERSION.
pynrrd version: NOT INSTALLED or UNKNOWN VERSION.
clearml version: NOT INSTALLED or UNKNOWN VERSION.

For details about installing the optional dependencies, please visit:
https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

Running data analysis...
2025-03-25 20:15:13,203 - INFO - Found 1 GPUs for data analyzing!
/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py:624: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(
100%|██████████| 12/12 [00:00<00:00, 17.97it/s]2025-03-25 20:15:13,899 - INFO - Writing data stats to /content/helloworld_work_dir/datastats.yaml.
2025-03-25 20:15:13,909 - INFO - Writing by-case data stats to /content/helloworld_work_dir/datastats_by_case.yaml, this may take a while.
2025-03-25 20:15:13,963 - INFO - BundleGen from https://github.com/Project-MONAI/research-contributions/releases/download/algo_templates/c108ea9.tar.gz
algo_templates.tar.gz: 104kB [00:00, 196kB/s] 2025-03-25 20:15:15,053 - INFO - Downloaded: /tmp/tmpuwv56btk/algo_templates.tar.gz
2025-03-25 20:15:15,054 - INFO - Expected md5 is None, skip md5 check for file /tmp/tmpuwv56btk/algo_templates.tar.gz.
2025-03-25 20:15:15,055 - INFO - Writing into directory: /content/helloworld_work_dir.
2025-03-25 20:15:15,190 - INFO - Generated:/content/helloworld_work_dir/dints_0
2025-03-25 20:15:15,222 - INFO - Generated:/content/helloworld_work_dir/segresnet_0
2025-03-25 20:15:15,245 - INFO - segresnet2d_0 is skipped! SegresNet2D is skipped due to median spacing of [1.0, 1.0, 1.0], which means the dataset is not highly anisotropic, e.g. spacing[2] < 3*(spacing[0] + spacing[1])/2) .
2025-03-25 20:15:15,298 - INFO - Generated:/content/helloworld_work_dir/swinunetr_0
2025-03-25 20:15:15,335 - INFO - The keys num_warmup_epochs cannot be found in the /content/helloworld_work_dir/dints_0/configs/hyper_parameters.yaml for training. Skipped overriding key num_warmup_epochs.
2025-03-25 20:15:15,338 - INFO - ['python', '/content/helloworld_work_dir/dints_0/scripts/train.py', 'run', "--config_file='/content/helloworld_work_dir/dints_0/configs/hyper_parameters.yaml,/content/helloworld_work_dir/dints_0/configs/hyper_parameters_search.yaml,/content/helloworld_work_dir/dints_0/configs/network.yaml,/content/helloworld_work_dir/dints_0/configs/network_search.yaml,/content/helloworld_work_dir/dints_0/configs/transforms_infer.yaml,/content/helloworld_work_dir/dints_0/configs/transforms_train.yaml,/content/helloworld_work_dir/dints_0/configs/transforms_validate.yaml'", '--training#num_epochs_per_validation=1', '--training#num_images_per_batch=2', '--training#num_epochs=2']


CalledProcessError Traceback (most recent call last)
in <cell line: 0>()
----> 1 runner.run()

5 frames
/usr/local/lib/python3.11/dist-packages/monai/apps/auto3dseg/auto_runner.py in run(self)
876 if len(history) > 0:
877 if not self.hpo:
--> 878 self._train_algo_in_sequence(history)
879 else:
880 self._train_algo_in_nni(history)

/usr/local/lib/python3.11/dist-packages/monai/apps/auto3dseg/auto_runner.py in _train_algo_in_sequence(self, history)
726 algo = algo_dict[AlgoKeys.ALGO]
727 if has_option(algo.train, "device_setting"):
--> 728 algo.train(self.train_params, self.device_setting)
729 else:
730 algo.train(self.train_params)

/content/helloworld_work_dir/algorithm_templates/dints/scripts/algo.py in train(self, train_params, device_setting, search)
494 cmd, devices_info = self._create_cmd(dints_train_params)
495 cmd = "OMP_NUM_THREADS=1 " + cmd
--> 496 return self._run_cmd(cmd, devices_info)
497
498

/usr/local/lib/python3.11/dist-packages/monai/apps/auto3dseg/bundle_gen.py in _run_cmd(self, cmd, devices_info)
275 )
276 else:
--> 277 return run_cmd(cmd.split(), run_cmd_verbose=True, env=ps_environ, check=True)
278
279 def train(

/usr/local/lib/python3.11/dist-packages/monai/utils/misc.py in run_cmd(cmd_list, **kwargs)
890 monai.apps.utils.get_logger("monai.utils.run_cmd").info(f"{cmd_list}") # type: ignore[attr-defined]
891 try:
--> 892 return subprocess.run(cmd_list, **kwargs)
893 except subprocess.CalledProcessError as e:
894 if not debug:

/usr/lib/python3.11/subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs)
569 retcode = process.poll()
570 if check and retcode:
--> 571 raise CalledProcessError(retcode, process.args,
572 output=stdout, stderr=stderr)
573 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['python', '/content/helloworld_work_dir/dints_0/scripts/train.py', 'run', "--config_file='/content/helloworld_work_dir/dints_0/configs/hyper_parameters.yaml,/content/helloworld_work_dir/dints_0/configs/hyper_parameters_search.yaml,/content/helloworld_work_dir/dints_0/configs/network.yaml,/content/helloworld_work_dir/dints_0/configs/network_search.yaml,/content/helloworld_work_dir/dints_0/configs/transforms_infer.yaml,/content/helloworld_work_dir/dints_0/configs/transforms_train.yaml,/content/helloworld_work_dir/dints_0/configs/transforms_validate.yaml'", '--training#num_epochs_per_validation=1', '--training#num_images_per_batch=2', '--training#num_epochs=2']' returned non-zero exit status 1.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions