Skip to content

Only apply pyrodigal meta mode when genome < 20kb and prevent long runs of N in predicted genes#368

Open
JeanMainguy wants to merge 4 commits intodevfrom
fix_pyrodigal_mode
Open

Only apply pyrodigal meta mode when genome < 20kb and prevent long runs of N in predicted genes#368
JeanMainguy wants to merge 4 commits intodevfrom
fix_pyrodigal_mode

Conversation

@JeanMainguy
Copy link
Member

@JeanMainguy JeanMainguy commented Feb 27, 2026

Problem:
When running Pyrodigal in meta mode for highly fragmented genomes, it sometimes assigns the wrong genetic code to contigs. This happens because the current logic switches to meta mode if the largest contig is smaller than 20 kb, even though the total genome size might be sufficient for training.

Solution:
Updated the logic to switch to meta mode only if the sum of all contig sizes in the genome is less than 20 kb (instead of just the largest contig). This ensures Pyrodigal trains on the full genome when possible, reducing incorrect genetic code assignments.

Impact:

  • Fixes wrong genetic code assignments in fragmented genomes.
  • Add a warning if more than one translation table is used for a genome

Other Change:
I also modify the min_mask parameter to pyrodigal.GeneFinder. This paramter defines the minimum length of masked sequence to trigger masking and it default value is 50. I set it to 9 to prevent long runs of N in predicted genes. Some problematic genes were observed in genome GCA_022531725.1.

@JeanMainguy JeanMainguy changed the title Only apply pyrodigal meta mode when genome < 20kb Only apply pyrodigal meta mode when genome < 20kb and prevent long runs of N in predicted genes Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants