Skip to content

Topaz training error in multiprocessing data loader #173

@tbepler

Description

@tbepler

Discussed in #172

Originally posted by wuhucryoem July 24, 2023
When I do a topaz training, it show me there haven't some file or directory, but don't show me the concrete file or directory.Like that:

Traceback (most recent call last):
File "/home/amax/miniconda3/envs/topaz/bin/topaz", line 33, in
sys.exit(load_entry_point('topaz-em==0.2.5', 'console_scripts', 'topaz')())
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/topaz/main.py", line 148, in main
args.func(args)
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/topaz/commands/train.py", line 695, in main
, save_prefix=save_prefix, use_cuda=use_cuda, output=output)
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/topaz/commands/train.py", line 577, in fit_epochs
, use_cuda=use_cuda, output=output)
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/topaz/commands/train.py", line 552, in fit_epoch
for X,Y in data_iterator:
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1186, in _next_data
idx, data = self._get_data()
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1152, in _get_data
success, data = self._try_get_data()
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 990, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 289, in rebuild_storage_fd
fd = df.detach()
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions