Open
Description
The current approach assumes that there's a single GPU processing data. However, this is not true for other users or even us when we need to scale to bigger datasets. Therefore, I need to implement a multiprocessing approach that uses all the GPUs available at the moment.
In addition to that, there's a down time in the GPU processing that is wasted. With a Z1 dataset in the 2nd multiscale, this down time is about 12 seconds when we're writing to the zarr dataset in the prediction of the gradients. The GPU should, ideally, be processing data all the time while another process should be looking at gathering the results and writing them to zarr.