Conversation
|
@astrovsky01 Connecting you to the PR |
bgruening
left a comment
There was a problem hiding this comment.
Thanks @tcollins2011!
The DB is tiny? Is it the wrong DB?
A few initial comments inline.
tools/sylph/sylph.xml
Outdated
| <expand macro="input_database"/> | ||
| </inputs> | ||
| <outputs> | ||
| <data format="tabular" name="output" label="${tool.name} on ${on_string}" from_work_dir="output.tsv"/> |
There was a problem hiding this comment.
you could add here as metadata the column names if you like. they seem to be static.
There was a problem hiding this comment.
They query and profile commands both generate an output tabular file, but are slightly different (i.e for output_1.tabular and output_5.tabular are similar but slightly different, and only difference is query vs profile). We could split this into two tools if you think that would be valuable, though?
bebatut
left a comment
There was a problem hiding this comment.
Thanks @tcollins2011! I put some comments.
A big one: it seems that both Python scripts come from https://github.com/bluenote-1577/sylph-utils and there is no LICENSE in the repository. So we are not allowed to take the scripts and add them here.
It might be better to find a way (maybe talking with authors) so they make the scripts available via conda maybe. That would avoid duplication
tools/sylph/sylph.xml
Outdated
| <param name="functions" type="select" label="ANI querying or taxonomic profiling" help="ANI(Average Nucleotide Identity) is a measure of nucleotide-level similarity between the genomes of two microogranisms. Profile querying involves comparing functional or sequence profiles derived from the genomic data."> | ||
| <option value="profile">Taxanomic Profile</option> | ||
| <option value="ani">ANI Query</option> |
There was a problem hiding this comment.
I am wondering if it would not more sense to have 2 Galaxy tools (one for sylph profile, one for sylph query).
That would be closer of how people would use it via command-line. What do you think?
tools/sylph/sylph.xml
Outdated
| #else: | ||
| && ln -s "$database_select.metadata" "database.tsv.gz" | ||
| #end if | ||
| && python "$__tool_directory__/sylph_to_taxprof.py" -s output.tsv -m database.tsv.gz -o metaphlan_ |
There was a problem hiding this comment.
You probably need to put in a # if "metaphlan" in $function.outputs:
There was a problem hiding this comment.
The metaphlan tool must be run regardless if krona, metphlan, or both are requested. The krona formatted file is generated from the metaphlan converted file. The #if part is about if we hand it back to the user
The DB we used in this for testing is basically just e. coli, and is as minimal as possible to run tests againstt. Even at that size, though, the metadata file was too big for the repo. That's also why the final test is just checking command text. The next tool we were planning to work on was one to generate sylph db and their associated metada files so users can build their own. In the meantime, I added a readme clarifying how to get the pre-made db and metadata. |
|
@bgruening @astrovsky01 We could instead wrap a larger database for more thorough testing. We essentially made as small a database as possible when we trying to push this to tools-iuc, but if you think it would provide more thorough testing we could instead swap it out for a larger database. In theory though, the tiny database should allow for fully functional testing within galaxy. |
Adding a profile to the tool Co-authored-by: Björn Grüning <bjoern@gruenings.eu>
Changing to correct repository owner Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
Improving description of tool Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
Improved description of the tool Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
All good, I was just confused because you said the DB is 6MB and I wanted to check if the submitted file was a mistake. |
Updated due to publication Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
@bebatut the krona converter is actually a modified version of the one wrapped in the Metaphlan tool in galaxy. I believe it's from Galaxy, too, since I don't see it in the metaphlan original repo |
Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
|
It looks like test files are missing and the limiting is not passing. |
|
@bgruening Alright I think everything should be passing now except the file size check. |
tools/sylph/sylph.xml
Outdated
| <param name="functions" value="profile"/> | ||
| <param name="min_num_kmers" value="49"/> | ||
| </conditional> | ||
| <output name="output" value="output_2.tabular" compare="sim_size"/> |
There was a problem hiding this comment.
please don't use sim_size on text outputs, use asserts :)
| <when value="cached"> | ||
| <param label="Select a sylph database" name="sylph_database" type="select"> | ||
| <options from_data_table="sylph_databases"> | ||
| <validator message="No Sylph databases are available" type="no_options" /> |
There was a problem hiding this comment.
add here a filter for version 1 or something like that ... and then we include version 1 in the test file.
Whenever the tool changes to DB layout we increase this version.
There was a problem hiding this comment.
ping, you are not filtering the DB here according to the version
bgruening
left a comment
There was a problem hiding this comment.
Looks mostly good to me, just 2 small comments.
….xml file to have more shared logic
|
@bgruening Sylph has been split into two separate tools now both within this directory. Additionally, I see that we are still failing the file size check, but I'm not sure if it is possible to reduce the database metadata files to a smaller size. |
bgruening
left a comment
There was a problem hiding this comment.
Thanks @tcollins2011! Added a few more smaller comments!
| <when value="cached"> | ||
| <param label="Select a sylph database" name="sylph_database" type="select"> | ||
| <options from_data_table="sylph_databases"> | ||
| <validator message="No Sylph databases are available" type="no_options" /> |
There was a problem hiding this comment.
ping, you are not filtering the DB here according to the version
| </param> | ||
| <!-- Only permitting fastq as tool input only allows fastq and fastq.gz as file ext --> | ||
| <when value="single"> | ||
| <param name="input" type="data" format="fastq,fastq.gz,fastqsanger,fastqsanger.gz" label="Single-end input reads"/> |
There was a problem hiding this comment.
Why is this needed, can we not just go with the multiple=true option?
|
@tcollins2011 bluenote-1577/sylph-utils#2 seems to be resolved. |
|
@bgruening Tests are updated again. I think everything should be resolved if you want to take a final look. |
tools/sylph/sylph_query.xml
Outdated
| <param name="select" value="custom"/> | ||
| <conditional name="sketch_fasta"> | ||
| <param name="type_fasta" value="single_fasta"/> | ||
| <param name="input" value="e.coli-o157.fasta.gz" ftype="fasta"/> |
There was a problem hiding this comment.
You could use remote files here, this way we don't need to put it into the repo. Have a look at location https://docs.galaxyproject.org/en/latest/dev/schema.html#id108
Few tests modifications
Add long description into shed sylph tax DM
|
Hi @bgruening, the CI seems to be correct so it's ready for review. |
Moving the sylph tool wrapper pull request to galaxytools due to the metadeata file associated with the .syldb database having a size of 6mb.