Skip to content

standarize gives misleading results on Arden style gene names #98

@JamieHeather

Description

@JamieHeather

Hi Yuta,

I was just using tidytcells to fix some TCR names from some older papers, when I noticed that it was getting a few of them wrong. When I looked into it they all seemed to be from older pre-IMGT nomenclatures, particularly that from Arden et al. I made use of these two papers with gene ID conversion tables:

  • LeFranc et al. (which is basically the supplementary data following the original IMGT nomenclature approval)
  • Arden et al. (which details that older nomenclature, handily providing a bunch of accessions, which are out of date but do link to modern sequence IDs for validation)

I then picked out a couple of example genes:

  • TRAV1-2
    • In the Arden nomenclature this is AV7S2
    • (This is what actually made me start looking, when I noticed that some supposedly MAIT alpha chain antibodies bind 'TCRAV7S2')
  • TRBV19
    • This is BV17S1 under Arden
    • (This is particularly relevant, as 'TCRBV17S1' is an example given in the tidytcells paper)

Running these through tidytcells gives the wrong answers:

import tidytcells as tt


tt.tr.standardize('AV7S2')
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/tidytcells/_utils/warnings.py:12: UserWarning: Failed to standardize "AV7S2" for species homosapiens: unrecognised gene name. (best attempted fix: "TRAV7-2").
  warn(warning_message)

tt.tr.standardize('BV17S1')
'TRBV17'

I guess the issue is that Arden-style gene IDs look plausibly like old transitional IMGT IDs (which I think we've discussed in the past), and so they're getting simply updated, rather than looked up! It makes me wonder if there could/should be a way to stipulate whether a specific nomenclature is being used, or perhaps raise a flag or warning if an ambiguous older ID is used?

On a related note, I also noticed (while trying to standardise a bunch of V-gene specific antibodies) that it doesn't seem to handle the (presumably even older style) TCR names with Greek characters, e.g. using old TRAV1-2 protein names again:

tt.tr.standardize('Vα7.2')
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/tidytcells/_utils/warnings.py:12: UserWarning: Failed to standardize "Vα7.2" for species homosapiens: unrecognised gene name. (best attempted fix: "TRV").
  warn(warning_message)

This makes sense, but given the retention of these names in product names it might be worth trying to cope with them in a future release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions