Open
Description
I'm using future.batchtools
via drake
, and just got my first plan running on the cluster. It seems to take about one minute for each job submitted, and since I'm trying to submit several hundred jobs, that's not ideal (although it's not a deal-breaker, because I expect each job to take many hours to finish). I'm not sure what I might be able to change in order to speed this up. I haven't dived into the code, but my idea of what needs to happen to start a worker is:
- analyse the code to find dependencies
- submit the job to the scheduler (SLURM in my case)
- wait for the job to be allocated
- wait for the worker to start up (and load libraries?)
- send data to the worker (and libraries?)
Is this basically accurate?
Does the worker load libraries already installed on its node, or are all libraries sent to the worker by the master? If the latter, then reducing library dependencies seems like a potential avenue to try.