-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Describe the bug
The pipeline fails during steps that process BAM files (e.g., the savage rule using Picard SamToFastq) with a SAMFormatException. The root cause is that the version of Bowtie2 pinned in the Conda environment (bowtie2=2.4.1) has a known bug where it creates a malformed @PG header line in the output BAM file. Specifically, the VN: (version) tag is left empty, which violates the SAM/BAM specification.
Example of the malformed header line:
@PG ID:bowtie2 PN:bowtie2 VN: CL:"..."
To Reproduce
Steps to reproduce the behavior:
- V-pipe configuration file used: Any configuration where
aligner: bowtie2is selected. - Samples TSV file used: Any standard paired-end FASTQ samples.
- Commands executed: Run the pipeline past the alignment stage, e.g.,
snakemake --use-conda --cores 32 - See error: The error will manifest as a
SAMFormatExceptionin a later rule that uses a strict BAM parser like Picard. The cause can be seen immediately by inspecting the header of the alignment BAM file:samtools view -H path/to/results/sample/alignments/REF_aln.bam.
Expected behavior
The BAM file generated by the Bowtie2 alignment step should have a valid header. The @PG line for bowtie2 should contain a correctly formatted version number, for example: VN:2.5.1. The pipeline should not fail in downstream steps due to header validation errors.
Screenshots
N/A
Desktop (please complete the following information):
- OS: Linux (HPC environment)
- Version:
masterbranch
Additional context
This issue is completely resolved by updating the Bowtie2 dependency to a more recent version where this bug is fixed. I can confirm that changing the environment file workflow/envs/bowtie2.yaml to the following resolves the issue:
channels:
- conda-forge
- bioconda
dependencies:
- bowtie2>=2.5.0
- samtools=1.10This ensures that all generated alignment files are valid and prevents errors in downstream processing.