Skip to content

ga4gh_identify adds digest properties in-place regardless of in_place argument #440

@jsstevenson

Description

@jsstevenson

I have a basic allele. If I manually add the VA ID, the compacted JSON dump is 323 chars:

>>> allele = Allele(location={"end": 44908822, "start": 44908821, "sequenceReference": {"id": "NC_0000019.10", "type": "SequenceReference", "refgetAccession": "SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl"}, "type": "SequenceLocation"}, state={"sequence": "T", "type":  "LiteralSequenceExpression"}, type="Allele")
>>> allele.id = "ga4gh:VA.0AePZIWZUNsUlQTamyLrjm2HWUw2opLt"
>>> len(allele.model_dump_json(exclude_none=True))
323

If I use ga4gh_identify, though, even if I set in_place to "never", digest fields are added that makes the final JSON over 400 characters:

>>> allele = Allele(location={"end": 44908822, "start": 44908821, "sequenceReference": {"id": "NC_0000019.10", "type": "SequenceReference", "refgetAccession": "SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl"}, "type": "SequenceLocation"}, state={"sequence": "T", "type":  "LiteralSequenceExpression"}, type="Allele")
>>> allele.id = ga4gh_identify(allele, "never")
>>> len(allele.model_dump_json(exclude_none=True))
411
>>> print(allele.model_dump_json(exclude_none=True, indent=2))
{
  "id": "ga4gh:VA.0AePZIWZUNsUlQTamyLrjm2HWUw2opLt",
  "type": "Allele",
  "digest": "0AePZIWZUNsUlQTamyLrjm2HWUw2opLt",
  "location": {
    "type": "SequenceLocation",
    "digest": "wIlaGykfwHIpPY2Fcxtbx4TINbbODFVz",
    "sequenceReference": {
      "id": "NC_0000019.10",
      "type": "SequenceReference",
      "refgetAccession": "SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl"
    },
    "start": 44908821,
    "end": 44908822
  },
  "state": {
    "type": "LiteralSequenceExpression",
    "sequence": "T"
  }
}

This isn't the end of the world, but an increase in size of ~25% ends up being pretty hefty in extreme cases. is it intentional/necessary for ga4gh_identify to add these digest fields in-place, or can that be taken out? Or alternatively, maybe a ga4gh_compact method could strip unset/redundant fields?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions