Skip to content

Fix genetic code/translation table management#367

Open
JeanMainguy wants to merge 8 commits intodevfrom
fix_genetic_code_management
Open

Fix genetic code/translation table management#367
JeanMainguy wants to merge 8 commits intodevfrom
fix_genetic_code_management

Conversation

@JeanMainguy
Copy link
Member

@JeanMainguy JeanMainguy commented Feb 26, 2026

Problem

Translation table was not correctly managed. When using annotation files, the translation table is parsed and saved for each CDS in the genedata table of the HDF5 file but was not reused later in the cluster step (the table specified by user or default was used instead). Many commands rely on the translation table but users were supposed to specify it each time as a parameter rather than using the one that was used to construct the pangenome.

Implementation

Added tracking of user-specified arguments:
Added a specified_args attribute to the args object that lists arguments explicitly set by the user. This allows distinguishing when an argument has been specified vs using a default value.

Pangenome-level genetic code:
PPanGGOLiN expects genomes in a pangenome to have the same genetic code, so a unique genetic code is determined at the pangenome level.

For annotation files (GFF, GBFF): Translation table is specified for each CDS in the genome files. This information is kept for each gene. To determine the table to use at the pangenome level, the most abundant one is determined. If more than one table is found, a warning is issued as this is not expected.

New behavior:

  • If translation table is specified by user (in command line or config file), this value is always used
  • If this value conflicts with the one from annotation files, a critical log warning is issued
  • If the table is not specified by user:
    • When using annotation files: use the value from annotation files
    • If not found in annotation files or using fasta input: use default code 11

Storage:
After the annotation step, the translation table used is stored in:

  • pangenome.status["translation_table"] for easy reuse in other steps
  • pangenome.parameters["annotate"]["translation_table"]

Extra info is added to parameters (prefixed with # so parameters can still be used as a config file):

  • # is_translation_table_user_specified: whether user explicitly set the value
  • # translation_table_from_annotation_files: the value parsed from annotation files (if applicable)

Example:

Parameters:
    annotate:
        # used_local_identifiers: True
        use_pseudo: False
        # is_translation_table_user_specified: False
        # translation_table_from_annotation_files: 11
        translation_table: 11
        # read_annotations_from_file: True
    cluster:
        coverage: 0.8
        identity: 0.8
        mode: 1
        # defragmentation: True
        no_defrag: False
        translation_table: 11
        # read_clustering_from_file: False

Commands affected:
For commands that need translation table information, the following priority is used:

  1. If user explicitly specifies a value → always use it
    • Critical warning logged if it conflicts with the pangenome's stored value
  2. Otherwise → use the value stored in pangenome.status["translation_table"] (defined during annotation)

Commands now using the stored translation table:

  • cluster
  • fasta
  • align
  • msa
  • context
  • projection

Projection command:
Previously had no --translation_table argument and was using the one found in cluster parameters. For consistency, the argument has been added to its command line and same treatment is applied (use the one from status if not specified by user).

Updated help messages:
Updated --translation_table help text across all commands to explain the new behavior and suggest using ppanggolin info to check the current value.

Note

Translation table and genetic code are used interchangeably in the code and mean the same thing. To prevent any breakage in the API, these terms were not homogenized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant