Track bacterial strains at sub-species resolution using genome distance metrics (MASH, fastANI) and strain-level metagenomics (inStrain, sourmash).
# MASH - fast distance estimation
conda install -c bioconda mash
# sourmash - metagenome comparisons
pip install sourmash
# fastANI - average nucleotide identity
conda install -c bioconda fastani
# inStrain - strain-level metagenomics
pip install instrainTell your AI agent what you want to do:
- "Calculate pairwise distances between my genome assemblies"
- "Track strains across timepoints in my longitudinal study"
- "Identify which reference genomes are in my metagenome"
"Calculate MASH distances between all genomes in this directory"
"Run fastANI to determine if these isolates are the same species"
"Cluster my outbreak isolates and identify closely related strains"
"Find genomes with less than 0.001 MASH distance (same strain)"
"Run sourmash gather to identify reference genomes in my metagenome"
"Use inStrain to profile strain variation in my sample"
"Track strain changes across my time-series samples using inStrain"
"Compare strain populations between treatment and control groups"
- Create genome sketches or signatures for efficient comparison
- Calculate pairwise distances or ANI values
- Cluster strains based on distance thresholds
- Profile within-sample variation for metagenomes
- Compare strain profiles across samples or timepoints
- MASH distance < 0.05 indicates same species (ANI > 95%)
- MASH distance < 0.001 suggests same strain
- sourmash uses MinHash sketches; compatible with large-scale comparisons
- inStrain requires BAM alignment to reference; provides SNV-level resolution
- fastANI is gold standard for species delineation
| MASH Distance | ANI | Interpretation |
|---|---|---|
| 0.00 | 100% | Same strain |
| < 0.05 | > 95% | Same species |
| 0.05-0.10 | 90-95% | Related species |
| > 0.10 | < 90% | Different species |
- popANI: Population ANI across reads
- conANI: Consensus ANI
- SNV density: Variation within sample