-
Notifications
You must be signed in to change notification settings - Fork 14
Description
If you run the included train_3D.py and monitor your hardware, you will see something like this:

The average power draw and GPU utilisation are very low. It's clear that this is because all the image data were loaded single-threadedly.
I am using a very old GPU (Radeon VII). The problem only gets worse with a better GPU.
Unfortunately, it doesn't seem possible to increase the number of workers for data loading. If you try to increase the number of workers by setting the 'num_workers' argument in PyTorch's Dataloader to a value greater than 0, you will get this:
Training: 0%| | 0/200 [00:00<?, ?it/s]WARNING:cellmap_data.dataloader:Worker failed to get item: cannot pickle '_thread.lock' object, falling back to main thread
WARNING:cellmap_data.dataloader:Worker failed to get item: cannot pickle '_thread.lock' object, falling back to main thread
Training: 0%| | 1/200 [00:06<22:23, 6.75s/it]WARNING:cellmap_data.dataloader:Worker failed to get item: cannot pickle '_queue.SimpleQueue' object, falling back to main thread
WARNING:cellmap_data.dataloader:Worker failed to get item: cannot pickle '_queue.SimpleQueue' object, falling back to main thread
Training: 1%| | 2/200 [00:07<11:16, 3.42s/it]WARNING:cellmap_data.dataloader:Worker failed to get item: cannot pickle '_queue.SimpleQueue' object, falling back to main thread
WARNING:cellmap_data.dataloader:Worker failed to get item: cannot pickle '_queue.SimpleQueue' object, falling back to main thread
Training: 2%|▏ | 3/200 [00:08<07:40, 2.34s/it]WARNING:cellmap_data.dataloader:Worker failed to get item: cannot pickle '_queue.SimpleQueue' object, falling back to main thread
WARNING:cellmap_data.dataloader:Worker failed to get item: cannot pickle '_queue.SimpleQueue' object, falling back to main thread
Not only does it give a warning for every training step, but the bottleneck remains and there is no speed increase.
The environment settings are the same as in #175.
You can replicate the issue with the simple script below:
minimal_script_pure_pytorch.py