-
Notifications
You must be signed in to change notification settings - Fork 10
Description
I am sorry to not have a reproducible example yet. My code base is very large and was running just fine until the job size became small. So I'll make a reproducible example after hearing some suggestions on what to test. In my case, the jobs are rather small - 10s each. The problem I'm seeing is they don't get scheduled very quickly. In fact, at any given time only one slurm job, or at most two, are running (the machine they are running on can run ~15 jobs by ram and cpu requirements). I'm trying to run 60 chunks, and to resolve this problem I set scheduling to 5, which did bump up the number of running jobs to 2-3. However, the main problem is that the chunks seem to take 10-15 seconds to launch, and I don't know what I changed - a few days ago, with larger jobs - this was not the case. So to my specific questions, before I try to generate a small reproducible example:
- Is there a verbose/trace flag? I can't see it in the documentation
- Is there anything to pay attention to beyond the "scheduling" option? Any hunch for what may be causing this issue for me to start looking at in generating an example?