Skip to content

PBSV_CALL inserts NUL bytes (^@) in the middle of a record? #47

@MikeHala

Description

@MikeHala

Description of the bug

Hi there,
I am running pacvar version 1.0.1 on HPC and tried to process PacBio data for HG002 (downloaded from https://downloads.pacbcloud.com/public/2026Q1/HG002-SPRQ-Nx/EA260902-Use1/).

My command line is like this

./nextflow run ..../pacbio/nf-core-pacvar/1_0_1/main.nf \
--input ./samplesheet.csv --outdir ./results \
--genome 'GATK.GRCh38' -profile eddie \
--workflow wgs --snv_caller deepvariant \
--skip_demultiplexing \
-c ./custom.config \
-c ./base.config \
-resume

Alignment completed successfully, still running SNV calling, but got a problem with the SV calling.
PBSV_DISCOVER and PBSV_CALL finished successfully, but NFCORE_PACVAR:PACVAR:BAM_SV_VARIANT_CALLING:BCFTOOLS_INDEX crashed with

Command error:
  INFO:    Converting SIF file to temporary sandbox...
  [E::hts_idx_push] Invalid record on sequence #72: end 1 < begin 138271
  index: failed to create index for "HG002.sv.vcf.gz"
  INFO:    Cleaning up image...

Upon closer examination of the HG002.sv.vcf file there appears to be an extra field introduced by PBSV_CALL for the middle record, just before the REF field

awk 'NR>=56885 && NR<=56887 {print}' HG002.sv.vcf
chrUn_KI270519v1        137207  pbsv.INS.53107  T       TGGAATAGAATGGAGTGGAGTGGAATGGAATGGAGTGGAGTGGAGTGGAATGGAGTGGAGTGGAGTGGAATGGAATGGAGAGGAGTGCAGA     .       NearContigEnd   SVTYPE=INS;END=137207;SVLEN=90  GT:AD:DP        1/1:31,200:231
chrUn_KI270519v1        138271  pbsv.INS.53108          TGGAGTGGAACGCAGAGAATGGAATGGAG   .       NearContigEnd   IMPRECISE;SVTYPE=INS;END=138271;SVLEN=29        GT:AD:DP        0/1:45,80:125
chrUn_KI270515v1        2949    pbsv.DEL.53109  ATTGTAGAAAAGGAAATGTCATCAAATAAATACTACACAGAAGCATTCAGAAAAACTTCTTTGTTATGAGTGCATTCATCACACAGAGTTGAACCTTTCCTTTGACTGAACAGTTTTGAAACACTCTTTTTGCAGAATCTGCAAGTGCATATTTTAGAGCTTTGAGGACAA     A       .       PASS    SVTYPE=DEL;END=3119;SVLEN=-170  GT:AD:DP      0/1:95,26:121

Shown in another way

cat HG002.sv.vcf | sed -n '56885,56887p' | cat -A
chrUn_KI270519v1^I137207^Ipbsv.INS.53107^IT^ITGGAATAGAATGGAGTGGAGTGGAATGGAATGGAGTGGAGTGGAGTGGAATGGAGTGGAGTGGAGTGGAATGGAATGGAGAGGAGTGCAGA^I.^INearContigEnd^ISVTYPE=INS;END=137207;SVLEN=90^IGT:AD:DP^I1/1:31,200:231$
chrUn_KI270519v1^I138271^Ipbsv.INS.53108^I^@^I^@TGGAGTGGAACGCAGAGAATGGAATGGAG^I.^INearContigEnd^IIMPRECISE;SVTYPE=INS;END=138271;SVLEN=29^IGT:AD:DP^I0/1:45,80:125$
chrUn_KI270515v1^I2949^Ipbsv.DEL.53109^IATTGTAGAAAAGGAAATGTCATCAAATAAATACTACACAGAAGCATTCAGAAAAACTTCTTTGTTATGAGTGCATTCATCACACAGAGTTGAACCTTTCCTTTGACTGAACAGTTTTGAAACACTCTTTTTGCAGAATCTGCAAGTGCATATTTTAGAGCTTTGAGGACAA^IA^I.^IPASS^ISVTYPE=DEL;END=3119;SVLEN=-170^IGT:AD:DP^I0/1:95,26:121$

Any advice on how to resolve this would be greatly appreciated!
Alternatively, do you have an ETA for PacVar version 1.1.0?

Best regards,
Mike

Command used and terminal output

Relevant files

No response

System information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions