Skip to content

don't use gatk selectvariants + allow varca to output SNVs and indels in the same VCF #2

Open
@aryarm

Description

@aryarm

Many callers output both InDels and SNVs in the same VCF.
In order to separate them from each other before outputting to the final TSVs, we use gatk SelectVariants. It conveniently allows us to keep GVCF blocks where there are no variants. However, there is no way to request it to label InDels as no-call when selecting SNVs and vice versa. It simply filters them out. This means that we lack depth and other valuable information at those sites.

I haven't been able to find a tool that achieves the behavior that we want, so I think we might have to write a custom script. We already have the classify.awk script, but it doesn't really work for every type of VCF ALT allele and it can only accept REF and ALT columns as input (and nothing else).
We should

  • modify classify.awk to work with
    • BND alleles
    • MIXED alleles
  • write a bash script to filter VCFs using classify.awk

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestwontfixThis will not be worked on

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions