Skip to content

Snakemake hangs forever with executor SLURM with slurm_persist_conn_open_without_init #207

Open
@Redmar-van-den-Berg

Description

@Redmar-van-den-Berg

The snakemake process hangs forever after the sacct command receives a "Connection refused". It looks like Snakemake stops trying to query the status of the job after a few retries. In fact, the submitted job has already completed succesfully, and I'm able to manually query it's job status using the sacct command.

Software Versions
snakemake-minimal=8.24.1
snakemake-executor-plugin-slurm=0.14.2
slurm 23.02.8

Describe the bug
After snakemake was unable to connect to the slurm database, it stops trying to connect, even after the connection issue has been resolved.

Logs

Job 6 has been submitted with SLURM jobid 20123883 (log: .snakemake/slurm_logs/rule_qc_seq_cutadapt/samplename/20123883.log).                                                       
The job status query failed with command: sacct -X --parsable2 --clusters all --noheader --format=JobIdRaw,State --starttime 2025-01-22T10:00 --endtime now --name 8ca29162-60ef-4968-8633-c28df38e937f                              
Error message: sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:slurm-hpc:6819: Connection refused                                                                                
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused

The job status query failed with command: sacct -X --parsable2 --clusters all --noheader --format=JobIdRaw,State --starttime 2025-01-22T10:00 --endtime now --name 8ca29162-60ef-4968-8633-c28df38e937f                              
Error message: sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:slurm-hpc:6819: Connection refused                                                                                
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused

The job status query failed with command: sacct -X --parsable2 --clusters all --noheader --format=JobIdRaw,State --starttime 2025-01-22T10:00 --endtime now --name 8ca29162-60ef-4968-8633-c28df38e937f                              
Error message: sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:slurm-hpc:6819: Connection refused                                                                                
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused

The job status query failed with command: sacct -X --parsable2 --clusters all --noheader --format=JobIdRaw,State --starttime 2025-01-22T10:00 --endtime now --name 8ca29162-60ef-4968-8633-c28df38e937f                              
Error message: sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:slurm-hpc:6819: Connection refused                                                                                
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused

The job status query failed with command: sacct -X --parsable2 --clusters all --noheader --format=JobIdRaw,State --starttime 2025-01-22T10:00 --endtime now --name 8ca29162-60ef-4968-8633-c28df38e937f                              
Error message: sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:slurm-hpc:6819: Connection refused                                                                                
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused

Minimal example

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions