Skip to content

tax in zip files; supporting multiple taxonomies with command-line switches #2216

@ctb

Description

@ctb

In #2154, we've been talking about how to include taxonomic information in zipfiles, and I've been trying to figure out how that would work at the command line.

But all the discussion happened in a now-closed issue and a now-merged PR ;). So here's a new issue!

Comments copied over from various other issues and PRs -

From #2195 (comment), @bluegenes:

As discussed on slack :) -

re: SOURMASH-TAXONOMY — would you consider GTDB-TAXONOMY and NCBI-TAXONOMY instead, with the default being gtdb?

OR, somewhere in database info/metadata (which we don’t have yet, but have talked about), add the default for that database? In this case, I'm thinking about database info/metadata as database version (e.g. gtdb-rs207), sourmash signature version, creation date, etc -- and then adding default-taxonomy.

From #2012 (comment), I wrote:

Trying to figure out how distributing multiple taxonomies in a zip file would work at the command line.

The most obvious idea is:

sourmash tax classify -g gather.csv -t gtdb-xyz.zip --gtdb

which would load GTDB-TAXONOMY.csv from gtdb-xyz.zip, vs

sourmash tax classify -g gather.csv -t gtdb-xyz.zip --ncbi

which would load NCBI-TAXONOMY.csv from gtdb-xyz.zip.

Then we could potentially add --lins later on for #1813.

Alternative command-line switches would be --tax-type ncbi or something but I feel like --ncbi and --gtdb are probably simplest and easiest to remember.


which received @bluegenes endorsement:

I like --gtdb and --ncbi, especially since I can't see us integrating so many taxonomies that having an argument per tax type would be unwieldy.

--lins definitely useful when we get there!

Also sorta connects with #2186, searching/selecting on taxonomic lineages?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions