Skip to content

FORMAT tags not in caps lock are not loaded in the genotype table #582

@SamuelNicaise

Description

@SamuelNicaise

Genotype fields whose name is not entirely in caps are not loaded in the database (so all values for those fields are None).

Per VCF 4.3 specification, FORMAT tag names match the regular expression ^[A-Za-z ][0-9A-Za-z .]*$

To Reproduce

  1. Create a VCF with a FORMAT tag not entirely in caps lock
    ##FORMAT=<ID=VAF_min,Number=1,Type=Float,Description="VAF Variant Frequency minimum [Release=0.9.3;Date=20210721;AnnotationType=calculation]">

  2. Create a new project with that VCF

  3. Open the genotype module, see that all the genotype values for that field are None

image

  1. Open the database with a SQLite viewer, see that all the values in the genotype table for that field are None

image

Note:

  • This probably comes from this block in parse_variants() in the VcfReader class
      for gt_field in format_fields:
          try:
              value = sample[gt_field.upper()]
              if isinstance(value, list):
                  value = ",".join(str(i) for i in value)
              sample_data[gt_field] = value
          except AttributeError:
              pass

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions