Skip to content

add options for more genotype formats #3

@janxkoci

Description

@janxkoci

The current vcfGTcount.gawk script can be expanded to report not just the basic GT summaries, but also e.g. translated genotypes (TGT in the terminology of bcftools) or even IUPAC version (or IUPACGT in bcftools). For example:

  • -t = count translated genotypes
  • -i = count IUPAC-formated genotypes
  • -g = count numeric-style genotypes (default)

This would be handled by a function that gets called after extracting a genotype, using some if checking.

function translate(gt, ref, alt, iupac)
{
    gsub(/0/, ref, gt)
    gsub(/1/, alt, gt)
    if (iupac == 1)
        gt = iupacdict[gt] # needs a dict of iupac codes
    return gt
}

It can be handled by a single function, but maybe more efficient would be to have two functions, so that the if (iupac == 1) is called once rather than on every genotype.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions