Hello,
There is an error that occurs in snippy-vcf_to_tab when two or more features overlap (e.g. CDS and a ncRNA) and there is a mutation present. The mutation will be assigned to the last feature added to a specific matrix in the code, instead of the actual feature containing the mutation.
Example and input
This was detected in the following example, where a stop codon in mexR (PA0424) of P. aeruginosa overlaps two ncRNAs (PA0423.2 and PA0263.1) (Can see the area here https://www.pseudomonas.com/feature/show?id=103585):

The produced VCF by snippy is correct:
NC_002516.2 471371 . C A 819.194 . AB=0;AO=25;DP=25;QA=942;QR=0;RO=0;TYPE=snp;ANN=A|stop_gained|HIGH|PA0424|GENE_PA0424|transcript|TRANSCRIPT_PA0424|protein_coding|1/1|c.379G>T|p.Glu127*|379/444|379/444|127/147||,A|intragenic_variant|MODIFIER|PA0263.1|null|gene_variant|null|||n.471371C>A||||||,A|non_coding_transcript_variant|MODIFIER|AS1974|AS1974|transcript|PA0423.1|lincRNA||||||||,A|non_coding_transcript_variant|MODIFIER|AS1974-shorter1|AS1974-shorter1|transcript|PA0423.2|lincRNA|||||||| GT:DP:RO:QR:AO:QA:GL 1/1:25:0:0:25:942:-85.0975,-7.52575,0
But it is lost in the output of snippy-vcf_to_tab as the following line, excluding the mutation in PA0424 from the result and assigning it to the ncRNA PA0423.2:
NC_002516.2 471371 snp C A A:25 C:0 ncRNA + 379/444 127/147 stop_gained c.379G>T p.Glu127* PA0423.2 AS1974-shorter1 AS1974-shorter1
Input files to reproduce the problem:
example_files.zip
Problem in code
The problem seems to reside in the following lines.
In this line, each position of each feature is saved, but overlapping positions are overwritten:
|
$olap{$f->seq_id}->[$pos] = $nfeat; # FIXME: should be splice ? |
Here, the last feature that overwrote the previous ones is selected:
|
my $aff_featid = $olap{ $chr }[ $pos ]; # does this overlap a feature? |
So when it is checking the mutation, it gives the info of a different locus here:
|
my $f = $feat{$aff_featid}; |
Versions
snippy=4.6.0=hdfd78af_2
Hello,
There is an error that occurs in snippy-vcf_to_tab when two or more features overlap (e.g. CDS and a ncRNA) and there is a mutation present. The mutation will be assigned to the last feature added to a specific matrix in the code, instead of the actual feature containing the mutation.
Example and input
This was detected in the following example, where a stop codon in mexR (PA0424) of P. aeruginosa overlaps two ncRNAs (PA0423.2 and PA0263.1) (Can see the area here https://www.pseudomonas.com/feature/show?id=103585):
The produced VCF by snippy is correct:
But it is lost in the output of snippy-vcf_to_tab as the following line, excluding the mutation in PA0424 from the result and assigning it to the ncRNA PA0423.2:
Input files to reproduce the problem:
example_files.zip
Problem in code
The problem seems to reside in the following lines.
In this line, each position of each feature is saved, but overlapping positions are overwritten:
snippy/bin/snippy-vcf_to_tab
Line 57 in 3362a59
Here, the last feature that overwrote the previous ones is selected:
snippy/bin/snippy-vcf_to_tab
Line 83 in 3362a59
So when it is checking the mutation, it gives the info of a different locus here:
snippy/bin/snippy-vcf_to_tab
Line 85 in 3362a59
Versions
snippy=4.6.0=hdfd78af_2