Skip to content

aragorn_out_to_gff3.py error when parsing genomes with tmRNA's #1482

@cdshaffer

Description

@cdshaffer

I was using the "CPT Phage Structural Workflow v2024.1 shared by user jasongill" at the "https://phage.usegalaxy.eu/" galaxy isntance on a phage and getting an error in the workflow which I traced back to the tool aragorn_out_to_gff3.py

Here is the genome that is giving the error:
Elmer.fasta.txt

Here are the details from galaxy:

Galaxy Tool ID | toolshed.g2.bx.psu.edu/repos/bgruening/trna_prediction/aragorn_trna/0.6

with this command line returning the error:
aragorn '/data/dnb10/galaxy_db/files/3/4/8/dataset_348bacf1-a138-43a0-a09b-6be8e8854bac.dat' -gc11 -m -t -c -w | python '/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/trna_prediction/358f58401cd6/trna_prediction/aragorn_out_to_gff3.py' false > '/data/jwd02f/main/072/526/72526578/outputs/dataset_39c2a2b6-0d3f-4e62-bb00-c465960c860d.dat'

and this traceback:
Traceback (most recent call last): File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/trna_prediction/358f58401cd6/trna_prediction/aragorn_out_to_gff3.py", line 66, in <module> aa_short = aa_table[aa_long] KeyError: ''

I believe the error comes from the tmRNA that is present in the genome. I ran aragorn locally using this command:

» aragorn Elmer.fasta -gc11 -m -t   -c  -w | tail -8
38  tRNA-Arg                 [99283,99355]	34  	(acg)
39  tmRNA                  [100524,100803]	91,132	ANSNVASAYALAA*
40  tRNA-Leu               [108208,108281]	34  	(taa)
41  tRNA-Leu               [108598,108680]	34  	(gag)
42  tRNA-Val               [109257,109328]	33  	(gac)
43  tRNA-Leu               [110275,110352]	36  	(caa)
44  tRNA-Ser               [110601,110692]	37  	(gct)
45  tRNA-Gln               [117516,117591]	35  	(ttg)

When this is parsed by aragorn_out_to_gff3.py the line containing the tmRNA data is split and aa_long is created by aa_long = data[1][5:] which does return the 3 letter amino acid code for tRNA lines but returns an empty string on the tmRNA line, hence the key error in the above traceback. In fact a brief look over the aragorn_out_to_gff3.py code and it does not appear to do any parsing of tmRNA lines, which, as you can see above, are quite different from the tRNA lines.

Suggesting code to parse the tmRNA is beyond me, but it does appear that an easy mitigation to start which would be to just have aragorn only call tRNA's and not call the tmRNA genes. This works on my machine by removing the '-m' when calling aragorn:

» aragorn Elmer.fasta -gc11 -t   -c  -w | tail -8
37  tRNA-Thr                 [99092,99164]	34  	(ggt)
38  tRNA-Arg                 [99283,99355]	34  	(acg)
39  tRNA-Leu               [108208,108281]	34  	(taa)
40  tRNA-Leu               [108598,108680]	34  	(gag)
41  tRNA-Val               [109257,109328]	33  	(gac)
42  tRNA-Leu               [110275,110352]	36  	(caa)
43  tRNA-Ser               [110601,110692]	37  	(gct)
44  tRNA-Gln               [117516,117591]	35  	(ttg)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions