For some reason the spillover directory can get corrupted and prevent forward progress when processing images. Worker nodes start up but stall with this error:
dask-worker-4332718.err:2025-08-05 12:25:42,580 - distributed.nanny - INFO - Closing Nanny at 'tcp://192.168.32.23:41009'. Reason: failure-to-start-<class 'asyncio.exceptions.TimeoutError'>
dask-worker-4332719.err:2025-08-05 12:25:42,580 - distributed.nanny - INFO - Closing Nanny at 'tcp://192.168.32.23:39265'. Reason: failure-to-start-<class 'asyncio.exceptions.TimeoutError'>
This can be remedied manually by removing all scratch files in the spillover directory and restarting the run.
For some reason the spillover directory can get corrupted and prevent forward progress when processing images. Worker nodes start up but stall with this error:
This can be remedied manually by removing all scratch files in the spillover directory and restarting the run.