Skip to content

VCF normalization (--normalize_vcfs) drops an allele of 1/2 multiallelic sites due to --rm-dup all #2215

Description

@apolitics

Description

When --normalize_vcfs is enabled, sarek runs bcftools norm --multiallelics -both --rm-dup all. Because -m -both splits a multiallelic record into per-allele rows at the same position, the subsequent --rm-dup all (which de-duplicates by position) deletes all but the first row — silently dropping a real ALT allele at heterozygous two-alt (1/2) sites. This can lose genuine variants (e.g. compound-heterozygous-relevant calls).

Steps to reproduce

-profile test,docker --tools deepvariant --skip_tools baserecalibrator --normalize_vcfs --filter_vcfs

At chr22:13575, DeepVariant calls G → C,T with genotype 1/2. Running bcftools norm on the filtered VCF three ways:

bcftools norm args records site 13575
-m -both --rm-dup all (current sarek) 17 only G→CG→T lost
-m -both --rm-dup none 18 G→C and G→T
-m -both --rm-dup exact 18 G→C and G→T

Root cause

conf/modules/post_variant_calling.config, withName: 'VCFS_NORM':

ext.args = { [
    '--multiallelics -both',
    '--rm-dup all'   // comment: "output only the first instance of a record which is present multiple times"
].join(' ') }

The comment's stated intent — drop records that are identical — corresponds to --rm-dup exact, not all. The all mode removes by position, which collides with the per-allele rows produced by -m -both.

Proposed fix

Change --rm-dup all--rm-dup exact. Verified end-to-end on the test profile: the normalized VCF then keeps both alleles (18 vs 17 records) while still removing truly identical duplicate records.

Environment

  • sarek 3.8.1
  • bcftools 1.21
  • Nextflow 26.04.4

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions