In #2154, we've been talking about how to include taxonomic information in zipfiles, and I've been trying to figure out how that would work at the command line.
But all the discussion happened in a now-closed issue and a now-merged PR ;). So here's a new issue!
Comments copied over from various other issues and PRs -
From #2195 (comment), @bluegenes:
As discussed on slack :) -
re: SOURMASH-TAXONOMY — would you consider GTDB-TAXONOMY and NCBI-TAXONOMY instead, with the default being gtdb?
OR, somewhere in database info/metadata (which we don’t have yet, but have talked about), add the default for that database? In this case, I'm thinking about database info/metadata as database version (e.g. gtdb-rs207), sourmash signature version, creation date, etc -- and then adding default-taxonomy.
From #2012 (comment), I wrote:
Trying to figure out how distributing multiple taxonomies in a zip file would work at the command line.
The most obvious idea is:
sourmash tax classify -g gather.csv -t gtdb-xyz.zip --gtdb
which would load GTDB-TAXONOMY.csv from gtdb-xyz.zip, vs
sourmash tax classify -g gather.csv -t gtdb-xyz.zip --ncbi
which would load NCBI-TAXONOMY.csv from gtdb-xyz.zip.
Then we could potentially add --lins later on for #1813.
Alternative command-line switches would be --tax-type ncbi or something but I feel like --ncbi and --gtdb are probably simplest and easiest to remember.
which received @bluegenes endorsement:
I like --gtdb and --ncbi, especially since I can't see us integrating so many taxonomies that having an argument per tax type would be unwieldy.
--lins definitely useful when we get there!
Also sorta connects with #2186, searching/selecting on taxonomic lineages?
In #2154, we've been talking about how to include taxonomic information in zipfiles, and I've been trying to figure out how that would work at the command line.
But all the discussion happened in a now-closed issue and a now-merged PR ;). So here's a new issue!
Comments copied over from various other issues and PRs -
From #2195 (comment), @bluegenes:
From #2012 (comment), I wrote:
which received @bluegenes endorsement:
Also sorta connects with #2186, searching/selecting on taxonomic lineages?