Skip to content

sourmash branchwater: descriptions, links, alternatives, and related projectsΒ #3248

Open
@ctb

Description

@ctb

(This is intended to be a long-running infodump. Please suggest additions or amendments in comments!!)

sourmash branchwater provides real-time search of the Sequence Read Archive metagenomes along with a map of geo coordinates for discovered samples.

The underlying technology behind the live web site (below) uses a RocksDB-based inverted index that supports containment search. This index is implemented in the disk_revindex.rs code in sourmash-core, and is somewhat accessible at the command line via the branchwater plugin for sourmash.

Drawbacks/challenges & context:

  • sourmash uses FracMinHash sketching to compress sequences for search; with the current parameters, search is mainly limited to finding sequences > 5kb in size. This is described in detail in the above preprints.
  • as a result, branchwater is open source and deployable on relatively lightweight hardware. We are currently working with several groups to help them stand up search of non-public databases.

Related projects that do/did similar things

Metagraph https://metagraph.ethz.ch/ is a fantastic project that we've used elsewhere (see our SV paper).

Pebblescout - https://www.nature.com/articles/s41592-024-02280-z - uses inverted index to do k-mer searching.

searchSRA - https://www.searchsra.org/ - uses bowtie mapping to find matches to queries of interest in the SRA.

Project Logan - https://github.com/IndexThePlanet/Logan - focused on building unitigs from SRA metagenomes, supporting search.

Related projects with a different focus/emphasis

Serratus (https://serratus.io/; https://www.nature.com/articles/s41586-021-04332-2) used a mapping-based approach + massive parallelism in the cloud to find many novel RNA-dependent RNA polymerase domains => new RNA viruses.

AllTheBacteria provides assemblies of all isolate sequences (NOT metagenomes) in the SRA.

Metadata

Metadata

Assignees

No one assigned

    Labels

    fyiInformation that is interesting or usefulinfodumplong-running issue for information, links, etc.magsearchMAGsearch - search all the things

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions