Open
Description
The snakemake process hangs forever after the sacct
command receives a "Connection refused". It looks like Snakemake stops trying to query the status of the job after a few retries. In fact, the submitted job has already completed succesfully, and I'm able to manually query it's job status using the sacct
command.
Software Versions
snakemake-minimal=8.24.1
snakemake-executor-plugin-slurm=0.14.2
slurm 23.02.8
Describe the bug
After snakemake was unable to connect to the slurm database, it stops trying to connect, even after the connection issue has been resolved.
Logs
Job 6 has been submitted with SLURM jobid 20123883 (log: .snakemake/slurm_logs/rule_qc_seq_cutadapt/samplename/20123883.log).
The job status query failed with command: sacct -X --parsable2 --clusters all --noheader --format=JobIdRaw,State --starttime 2025-01-22T10:00 --endtime now --name 8ca29162-60ef-4968-8633-c28df38e937f
Error message: sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:slurm-hpc:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused
The job status query failed with command: sacct -X --parsable2 --clusters all --noheader --format=JobIdRaw,State --starttime 2025-01-22T10:00 --endtime now --name 8ca29162-60ef-4968-8633-c28df38e937f
Error message: sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:slurm-hpc:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused
The job status query failed with command: sacct -X --parsable2 --clusters all --noheader --format=JobIdRaw,State --starttime 2025-01-22T10:00 --endtime now --name 8ca29162-60ef-4968-8633-c28df38e937f
Error message: sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:slurm-hpc:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused
The job status query failed with command: sacct -X --parsable2 --clusters all --noheader --format=JobIdRaw,State --starttime 2025-01-22T10:00 --endtime now --name 8ca29162-60ef-4968-8633-c28df38e937f
Error message: sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:slurm-hpc:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused
The job status query failed with command: sacct -X --parsable2 --clusters all --noheader --format=JobIdRaw,State --starttime 2025-01-22T10:00 --endtime now --name 8ca29162-60ef-4968-8633-c28df38e937f
Error message: sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:slurm-hpc:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused
Minimal example
Additional context
Metadata
Metadata
Assignees
Labels
No labels