Skip to content

Bug: Bowtie2 v2.4.1 produces malformed BAM headers causing downstream failures #186

@poursalavati

Description

@poursalavati

Describe the bug
The pipeline fails during steps that process BAM files (e.g., the savage rule using Picard SamToFastq) with a SAMFormatException. The root cause is that the version of Bowtie2 pinned in the Conda environment (bowtie2=2.4.1) has a known bug where it creates a malformed @PG header line in the output BAM file. Specifically, the VN: (version) tag is left empty, which violates the SAM/BAM specification.

Example of the malformed header line:
@PG ID:bowtie2 PN:bowtie2 VN: CL:"..."

To Reproduce
Steps to reproduce the behavior:

  1. V-pipe configuration file used: Any configuration where aligner: bowtie2 is selected.
  2. Samples TSV file used: Any standard paired-end FASTQ samples.
  3. Commands executed: Run the pipeline past the alignment stage, e.g., snakemake --use-conda --cores 32
  4. See error: The error will manifest as a SAMFormatException in a later rule that uses a strict BAM parser like Picard. The cause can be seen immediately by inspecting the header of the alignment BAM file: samtools view -H path/to/results/sample/alignments/REF_aln.bam.

Expected behavior
The BAM file generated by the Bowtie2 alignment step should have a valid header. The @PG line for bowtie2 should contain a correctly formatted version number, for example: VN:2.5.1. The pipeline should not fail in downstream steps due to header validation errors.

Screenshots
N/A

Desktop (please complete the following information):

  • OS: Linux (HPC environment)
  • Version: master branch

Additional context
This issue is completely resolved by updating the Bowtie2 dependency to a more recent version where this bug is fixed. I can confirm that changing the environment file workflow/envs/bowtie2.yaml to the following resolves the issue:

channels:
  - conda-forge
  - bioconda
dependencies:
  - bowtie2>=2.5.0
  - samtools=1.10

This ensures that all generated alignment files are valid and prevents errors in downstream processing.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions