I was using the "CPT Phage Structural Workflow v2024.1 shared by user jasongill" at the "https://phage.usegalaxy.eu/" galaxy isntance on a phage and getting an error in the workflow which I traced back to the tool aragorn_out_to_gff3.py
Here is the genome that is giving the error:
Elmer.fasta.txt
Here are the details from galaxy:
Galaxy Tool ID | toolshed.g2.bx.psu.edu/repos/bgruening/trna_prediction/aragorn_trna/0.6
with this command line returning the error:
aragorn '/data/dnb10/galaxy_db/files/3/4/8/dataset_348bacf1-a138-43a0-a09b-6be8e8854bac.dat' -gc11 -m -t -c -w | python '/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/trna_prediction/358f58401cd6/trna_prediction/aragorn_out_to_gff3.py' false > '/data/jwd02f/main/072/526/72526578/outputs/dataset_39c2a2b6-0d3f-4e62-bb00-c465960c860d.dat'
and this traceback:
Traceback (most recent call last): File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/trna_prediction/358f58401cd6/trna_prediction/aragorn_out_to_gff3.py", line 66, in <module> aa_short = aa_table[aa_long] KeyError: ''
I believe the error comes from the tmRNA that is present in the genome. I ran aragorn locally using this command:
» aragorn Elmer.fasta -gc11 -m -t -c -w | tail -8
38 tRNA-Arg [99283,99355] 34 (acg)
39 tmRNA [100524,100803] 91,132 ANSNVASAYALAA*
40 tRNA-Leu [108208,108281] 34 (taa)
41 tRNA-Leu [108598,108680] 34 (gag)
42 tRNA-Val [109257,109328] 33 (gac)
43 tRNA-Leu [110275,110352] 36 (caa)
44 tRNA-Ser [110601,110692] 37 (gct)
45 tRNA-Gln [117516,117591] 35 (ttg)
When this is parsed by aragorn_out_to_gff3.py the line containing the tmRNA data is split and aa_long is created by aa_long = data[1][5:] which does return the 3 letter amino acid code for tRNA lines but returns an empty string on the tmRNA line, hence the key error in the above traceback. In fact a brief look over the aragorn_out_to_gff3.py code and it does not appear to do any parsing of tmRNA lines, which, as you can see above, are quite different from the tRNA lines.
Suggesting code to parse the tmRNA is beyond me, but it does appear that an easy mitigation to start which would be to just have aragorn only call tRNA's and not call the tmRNA genes. This works on my machine by removing the '-m' when calling aragorn:
» aragorn Elmer.fasta -gc11 -t -c -w | tail -8
37 tRNA-Thr [99092,99164] 34 (ggt)
38 tRNA-Arg [99283,99355] 34 (acg)
39 tRNA-Leu [108208,108281] 34 (taa)
40 tRNA-Leu [108598,108680] 34 (gag)
41 tRNA-Val [109257,109328] 33 (gac)
42 tRNA-Leu [110275,110352] 36 (caa)
43 tRNA-Ser [110601,110692] 37 (gct)
44 tRNA-Gln [117516,117591] 35 (ttg)
I was using the "CPT Phage Structural Workflow v2024.1 shared by user jasongill" at the "https://phage.usegalaxy.eu/" galaxy isntance on a phage and getting an error in the workflow which I traced back to the tool aragorn_out_to_gff3.py
Here is the genome that is giving the error:
Elmer.fasta.txt
Here are the details from galaxy:
with this command line returning the error:
aragorn '/data/dnb10/galaxy_db/files/3/4/8/dataset_348bacf1-a138-43a0-a09b-6be8e8854bac.dat' -gc11 -m -t -c -w | python '/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/trna_prediction/358f58401cd6/trna_prediction/aragorn_out_to_gff3.py' false > '/data/jwd02f/main/072/526/72526578/outputs/dataset_39c2a2b6-0d3f-4e62-bb00-c465960c860d.dat'and this traceback:
Traceback (most recent call last): File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/trna_prediction/358f58401cd6/trna_prediction/aragorn_out_to_gff3.py", line 66, in <module> aa_short = aa_table[aa_long] KeyError: ''I believe the error comes from the tmRNA that is present in the genome. I ran aragorn locally using this command:
When this is parsed by aragorn_out_to_gff3.py the line containing the tmRNA data is split and
aa_longis created byaa_long = data[1][5:]which does return the 3 letter amino acid code for tRNA lines but returns an empty string on the tmRNA line, hence the key error in the above traceback. In fact a brief look over the aragorn_out_to_gff3.py code and it does not appear to do any parsing of tmRNA lines, which, as you can see above, are quite different from the tRNA lines.Suggesting code to parse the tmRNA is beyond me, but it does appear that an easy mitigation to start which would be to just have aragorn only call tRNA's and not call the tmRNA genes. This works on my machine by removing the '-m' when calling aragorn: