How to reduce as much as possible the delay between slurm jobs?
early submission strategy with different accounts?
Average waiting delay with 32xA100: 2 days on dev queue
The delay with 48xA100 may be 4 days ?!
Jean Zay options:
- default: 20h/job, 512 GPU/job
- qos_gpu-t4: 100h/job, 16 GPU/job ==> bcp trop lent
Si on arrive, avec assez de GPU, a 20 compute-jours pour 100b, et si on attend 3j entre 2 jobs, alors il faut 96 jours pour 100b tokens.
How to reduce as much as possible the delay between slurm jobs?
early submission strategy with different accounts?
Average waiting delay with 32xA100: 2 days on dev queue
The delay with 48xA100 may be 4 days ?!
Jean Zay options:
Si on arrive, avec assez de GPU, a 20 compute-jours pour 100b, et si on attend 3j entre 2 jobs, alors il faut 96 jours pour 100b tokens.