Description
Many callers output both InDels and SNVs in the same VCF.
In order to separate them from each other before outputting to the final TSVs, we use gatk SelectVariants
. It conveniently allows us to keep GVCF blocks where there are no variants. However, there is no way to request it to label InDels as no-call when selecting SNVs and vice versa. It simply filters them out. This means that we lack depth and other valuable information at those sites.
I haven't been able to find a tool that achieves the behavior that we want, so I think we might have to write a custom script. We already have the classify.awk
script, but it doesn't really work for every type of VCF ALT allele and it can only accept REF and ALT columns as input (and nothing else).
We should
- modify
classify.awk
to work with- BND alleles
- MIXED alleles
- write a bash script to filter VCFs using
classify.awk