Fix genetic code/translation table management by JeanMainguy · Pull Request #367 · labgem/PPanGGOLiN

JeanMainguy · 2026-02-26T17:22:10Z

Problem

Translation table was not correctly managed. When using annotation files, the translation table is parsed and saved for each CDS in the genedata table of the HDF5 file but was not reused later in the cluster step (the table specified by user or default was used instead). Many commands rely on the translation table but users were supposed to specify it each time as a parameter rather than using the one that was used to construct the pangenome.

Implementation

Added tracking of user-specified arguments:
Added a specified_args attribute to the args object that lists arguments explicitly set by the user. This allows distinguishing when an argument has been specified vs using a default value.

Pangenome-level genetic code:
PPanGGOLiN expects genomes in a pangenome to have the same genetic code, so a unique genetic code is determined at the pangenome level.

For annotation files (GFF, GBFF): Translation table is specified for each CDS in the genome files. This information is kept for each gene. To determine the table to use at the pangenome level, the most abundant one is determined. If more than one table is found, a warning is issued as this is not expected.

New behavior:

If translation table is specified by user (in command line or config file), this value is always used
If this value conflicts with the one from annotation files, a critical log warning is issued
If the table is not specified by user:
- When using annotation files: use the value from annotation files
- If not found in annotation files or using fasta input: use default code 11

Storage:
After the annotation step, the translation table used is stored in:

pangenome.status["translation_table"] for easy reuse in other steps
pangenome.parameters["annotate"]["translation_table"]

Extra info is added to parameters (prefixed with # so parameters can still be used as a config file):

# is_translation_table_user_specified: whether user explicitly set the value
# translation_table_from_annotation_files: the value parsed from annotation files (if applicable)

Example:

Parameters:
    annotate:
        # used_local_identifiers: True
        use_pseudo: False
        # is_translation_table_user_specified: False
        # translation_table_from_annotation_files: 11
        translation_table: 11
        # read_annotations_from_file: True
    cluster:
        coverage: 0.8
        identity: 0.8
        mode: 1
        # defragmentation: True
        no_defrag: False
        translation_table: 11
        # read_clustering_from_file: False

Commands affected:
For commands that need translation table information, the following priority is used:

If user explicitly specifies a value → always use it
- Critical warning logged if it conflicts with the pangenome's stored value
Otherwise → use the value stored in pangenome.status["translation_table"] (defined during annotation)

Commands now using the stored translation table:

cluster
fasta
align
msa
context
projection

Projection command:
Previously had no --translation_table argument and was using the one found in cluster parameters. For consistency, the argument has been added to its command line and same treatment is applied (use the one from status if not specified by user).

Updated help messages:
Updated --translation_table help text across all commands to explain the new behavior and suggest using ppanggolin info to check the current value.

Note

Translation table and genetic code are used interchangeably in the code and mean the same thing. To prevent any breakage in the API, these terms were not homogenized.

JeanMainguy added 8 commits February 25, 2026 15:06

improve transl table management

217a83f

add translation_table to status

f575e29

add translation_table to status

54fcbbd

homogenize transl table management across cmds

88860d6

update info file with translation table new info

23c452b

imrpove translation_table help of annotate and workflow

d311885

when translation_table not found in status it is set to None

7ab553e

apply black

f49dbb8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix genetic code/translation table management#367

Fix genetic code/translation table management#367
JeanMainguy wants to merge 8 commits intodevfrom
fix_genetic_code_management

JeanMainguy commented Feb 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JeanMainguy commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Implementation

Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JeanMainguy commented Feb 26, 2026 •

edited

Loading