Skip to content

Manager doesn't detect when pipeline errors crashed snakemake #14

Open
@rabdill

Description

@rabdill

If a job in the snakemake pipeline fails, the manager correctly identifies that something didn't complete. BUT, if there's an error from snakemake itself, the running.txt file never gets deleted, so the manager thinks it's running indefinitely. Example from project PRJNA530790:

[Fri Jan 13 19:11:12 2023]
Finished job 396.
269 of 446 steps (60%) done
Select jobs to execute...

[Fri Jan 13 19:11:12 2023]
rule sra_to_fastq:
    input: SRR8849058/SRR8849058.sra
    output: fastq/SRR8849058.fastq
    jobid: 433
    reason: Missing output files: fastq/SRR8849058.fastq; Input files updated by another job: SRR8849058/SRR8849058.sra
    wildcards: sample=SRR8849058
    threads: 4
    resources: mem_mb=2000, mem_mib=1908, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, slurm_account=blekhman, slurm_partition=blekhman, runtime=480

WorkflowError:
SLURM job submission failed. The error message was sbatch: error: Batch job submission failed: Socket timed out on send/recv operation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions