Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## dev - [unreleased]
## v1.3.0dev [unreleased]

### `Added`

- [#682](https://github.com/nf-core/taxprofiler/pull/682) Added metacache classifier and improved nf-tests (added by @sofstam)
- [#559](https://github.com/nf-core/taxprofiler/pull/559) Profiling of long reads with motus (added by @LilyAnderssonLee and @sofstam )
- [#595](https://github.com/nf-core/taxprofiler/pull/595) **New classifier** [sylph](https://github.com/bluenote-1577/sylph) (added by @sofstam)
- [#608](https://github.com/nf-core/taxprofiler/pull/608) **New classifier** [melon](https://github.com/xinehc/melon) (added by @parisis and @sofstam)
Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,10 @@

> Shaw, J., & Yu, Y. W. (2024). Rapid species-level metagenome profiling and containment estimation with Sylph. Nature Biotechnology. https://doi.org/10.1038/s41587-024-02412-y

- [MetaCachae](https://doi.org/10.1093/bioinformatics/btx520)

> Müller, A., Hundt, C., Hildebrandt, A., Hankeln, T., & Schmidt, B.(2017). MetaCache: context-aware classification of metagenomic reads using minhashing. Bioinformatics, 33(23), 3740–3748. https://doi.org/10.1093/bioinformatics/btx520

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
- [ganon](https://pirovc.github.io/ganon/)
- [sylph](https://github.com/bluenote-1577/sylph)
- [Melon](https://github.com/xinehc/melon)
- [MetaCache](https://github.com/muellan/metacache)
5. Perform optional post-processing with:
- [bracken](https://ccb.jhu.edu/software/bracken/)
6. Standardises output tables ([`Taxpasta`](https://taxpasta.readthedocs.io))
Expand Down
13 changes: 13 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -920,6 +920,19 @@ process {
]
}


withName: METACACHE_QUERY {
tag = { "${meta.db_name}|${meta.id}" }
ext.args = { "${meta.db_params}" }
ext.prefix = { params.perform_runmerging ? "${meta.id}_${meta.db_name}" : "${meta.id}_${meta.run_accession}_${meta.db_name}" }
publishDir = [
path: { "${params.outdir}/metacache/${meta.db_name}/" },
mode: params.publish_dir_mode,
pattern: '*.txt'
]
}


withName: TAXPASTA_MERGE {
tag = { "${meta.tool}|${meta.id}" }
ext.prefix = { "${meta.tool}_${meta.id}" }
Expand Down
1 change: 1 addition & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ params {
run_kmcp = true
run_sylph = true
run_melon = true
run_metacache = true
sylph_taxonomy = params.pipelines_testdata_base_path + 'taxprofiler/data/database/sylph/test_taxonomy.tsv.gz'
krona_taxonomy_directory = params.pipelines_testdata_base_path + 'modules/data/genomics/sarscov2/metagenome/krona_taxonomy.tab'
malt_save_reads = true
Expand Down
1 change: 1 addition & 0 deletions conf/test_alternativepreprocessing.config
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ params {
run_kmcp = false
run_sylph = false
run_melon = false
run_metacache = false
}

process {
Expand Down
1 change: 1 addition & 0 deletions conf/test_bbduk.config
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ params {
run_kmcp = false
run_sylph = false
run_melon = false
run_metacache = false
}

process {
Expand Down
1 change: 1 addition & 0 deletions conf/test_falco.config
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ params {
run_kmcp = false
run_sylph = false
run_melon = false
run_metacache = false
}

process {
Expand Down
1 change: 1 addition & 0 deletions conf/test_fastp.config
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ params {
run_kmcp = false
run_sylph = false
run_melon = false
run_metacache = false
}

process {
Expand Down
3 changes: 3 additions & 0 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,9 @@ params {

run_sylph = true
sylph_taxonomy = params.pipelines_testdata_base_path + 'taxprofiler/data/database/sylph/sylph_taxonomy.tsv.gz'
run_metacache = true
metacache_abundances = true

run_profile_standardisation = true
run_krona = true
run_melon = true
Expand Down
1 change: 1 addition & 0 deletions conf/test_krakenuniq.config
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ params {
run_ganon = false
run_sylph = false
run_melon = false
run_metacache = false
run_krona = true
krona_taxonomy_directory = params.pipelines_testdata_base_path + 'modules/data/genomics/sarscov2/metagenome/krona_taxonomy.tab'
malt_save_reads = false
Expand Down
1 change: 1 addition & 0 deletions conf/test_malt.config
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ params {
run_kmcp = false
run_sylph = false
run_melon = false
run_metacache = false
}

process {
Expand Down
1 change: 1 addition & 0 deletions conf/test_minimal.config
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ params {
run_ganon = false
run_sylph = false
run_melon = false
run_metacache = false
}

process {
Expand Down
55 changes: 29 additions & 26 deletions conf/test_motus.config
Original file line number Diff line number Diff line change
Expand Up @@ -28,32 +28,35 @@ params {
config_profile_description = 'Minimal test to check mOTUs function'

// Input data
input = params.pipelines_testdata_base_path + 'taxprofiler/samplesheet.csv'
databases = 'database_motus.csv'
perform_shortread_qc = false
perform_longread_qc = false
perform_shortread_redundancyestimation = false
perform_shortread_complexityfilter = false
perform_shortread_hostremoval = false
perform_longread_hostremoval = false
perform_runmerging = false
hostremoval_reference = params.pipelines_testdata_base_path + 'modules/data/genomics/homo_sapiens/genome/genome.fasta'
run_kaiju = false
run_kraken2 = false
run_bracken = false
run_malt = false
run_metaphlan = false
run_centrifuge = false
run_diamond = false
run_krakenuniq = false
run_motus = true
run_kmcp = false
run_ganon = false
motus_save_mgc_read_counts = false
motus_remove_ncbi_ids = false
motus_use_relative_abundance = false
motus_save_splitlong_reads = false
run_profile_standardisation = true
input = params.pipelines_testdata_base_path + 'taxprofiler/samplesheet.csv'
databases = 'database_motus.csv'
perform_shortread_qc = false
perform_longread_qc = false
perform_shortread_redundancyestimation = false
perform_shortread_complexityfilter = false
perform_shortread_hostremoval = false
perform_longread_hostremoval = false
perform_runmerging = false
hostremoval_reference = params.pipelines_testdata_base_path + 'modules/data/genomics/homo_sapiens/genome/genome.fasta'
run_kaiju = false
run_kraken2 = false
run_bracken = false
run_malt = false
run_metaphlan = false
run_centrifuge = false
run_diamond = false
run_krakenuniq = false
run_motus = true
run_kmcp = false
run_ganon = false
run_sylph = false
run_melon = false
run_metacache = false
motus_save_mgc_read_counts = false
motus_remove_ncbi_ids = false
motus_use_relative_abundance = false
motus_save_splitlong_reads = false
run_profile_standardisation = true
}

process {
Expand Down
1 change: 1 addition & 0 deletions conf/test_nopreprocessing.config
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ params {
run_krona = true
run_sylph = true
run_melon = true
run_metacache = true
sylph_taxonomy = params.pipelines_testdata_base_path + 'taxprofiler/data/database/sylph/test_taxonomy.tsv.gz'
}

Expand Down
1 change: 1 addition & 0 deletions conf/test_noprofiling.config
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ params {
run_ganon = false
run_sylph = false
run_melon = false
run_metacache = false
}

process {
Expand Down
1 change: 1 addition & 0 deletions conf/test_prinseqplusplus.config
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ params {
run_kmcp = false
run_sylph = false
run_melon = false
run_metacache = false
}

process {
Expand Down
20 changes: 20 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [ganon](#ganon) - Taxonomic classifier and profile that uses Interleaved Bloom Filters as indices based on k-mers/minimizers.
- [sylph](#sylph) - Taxonomic classifier that performs ultrafast average nucleotide identity (ANI) querying or metagenomic profiling for metagenomic shotgun samples.
- [Melon](#Melon) - Taxonomic classifier that uses ribosomal marker genes to classify and quantify long-read metagenomic data.
- [MetaCache](#metacache) - Taxonomic classifier using minhashing
- [TAXPASTA](#taxpasta) - Tool to standardise taxonomic profiles as well as merge profiles across samples from the same database and classifier/profiler.
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
Expand Down Expand Up @@ -655,6 +656,25 @@ For further descriptions of the contents of each file, see the [sylph documentat
The main taxonomic classification files from melon are the `*.tsv` and `*.json` files.
For further descriptions of the contents of each file, see the [melon documentation](https://github.com/xinehc/melon#run-melon).

### MetaCache

[MetaCache](https://github.com/muellan/metacache) is a taxonomic classifier that minhashing for the classification of reads

<details markdown="1">
<summary>Output files</summary>

- `metacache/`
- `<db_name>/`
- `<sample_id>.abundances.txt`: Absolute and relative abundance of each taxon on each rank in .txt format
- `<sample_id>.mspping.txt`: Mapping outout in .txt format

</details>

MetaCache's default read mapping output format is:
`read_header | rank:taxon_name`

For further description see [MetaCache documentation](https://github.com/muellan/metacache/blob/d7646eca4c4dc131262b16d2910923fce3f5d4fc/docs/output.md#classification-output).

### Krona

[Krona](https://github.com/marbl/Krona) allows the exploration of (metagenomic) hierarchical data with interactive zooming, multi-layered pie charts.
Expand Down
7 changes: 7 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,7 @@ ganon,db1,,/<path>/<to>/ganon/test-db-ganon.tar.gz
kmcp,db1,;-I 20,/<path>/<to>/kmcp/test-db-kmcp.tar.gz
sylph,db1,-m 80,/<path>/<to>/sylph/test-db-sylph.tar.gz
melon,db1,,/<path>/<to>/melon/test-db-melon.tar.gz
metacache,db1,,/<path>/<to>/metacache/test-db-metacache.tar.gz
```

```csv
Expand All @@ -163,6 +164,7 @@ ganon,db1,,short,/<path>/<to>/ganon/test-db-ganon.tar.gz
kmcp,db1,;-I 20,short,/<path>/<to>/kmcp/test-db-kmcp.tar.gz
sylph,db1,-m 80,long,/<path>/<to>/sylph/test-db-sylph.tar.gz
melon,db1,,long,/<path>/<to>/melon/test-db-melon.tar.gz
metacache,db1,,long,/<path>/<to>/metacache/test-db-metacache.tar.gz
```

:::warning
Expand Down Expand Up @@ -215,6 +217,7 @@ The (uncompressed) database paths (`db_path`) for each tool are expected to cont
- [**KMCP**:](usage/tutorials.md#kmcp-custom-database) output of `kmcp index`. Note: `kmcp index` uses the output of an upstream `kmcp compute` step.
- [**sylph**:](usage/tutotials.md#sylph-custom-database) output of `sylph sketch` command.
- [**Melon**:](usage/tutorials.md#melon-custom-database) output of `diamond makedb` and `minimap2`.
- [**MetaCache**:](usage/tutorials.md#metacache-custom-database) output of `metacache build` command

## Running the pipeline

Expand Down Expand Up @@ -506,6 +509,10 @@ Currently, no specific tips or suggestions.
Melon is only suitable for long-read metagenomic profiling.
Therefore, nf-core/taxprofiler does not currently run Melon on data specified as being sequenced with `Illumina` or any other short-read platform in the input samplesheet.

##### MetaCache

Currently, no specific tips or suggestions.

#### Post Processing

##### Visualisation
Expand Down
19 changes: 19 additions & 0 deletions docs/usage/tutorials.md
Original file line number Diff line number Diff line change
Expand Up @@ -740,3 +740,22 @@ You can then add the path to `<YOUR_DB_NAME>/` to your nf-core/taxprofiler datab
</details>

More information on the Melon database can be found [here](https://github.com/xinehc/melon#database-setup).

#### MetaCache custom database

To build a custom MetaCache database, you need download the NCBI taxonomy. The fasta files can either be combined into a singile file or you can put them all together within a directory

```bash
download-ncbi-taxonomy ncbi_taxonomy
metacache build metacache all_genomes.fasta -taxonomy ncbi_taxonomy
```

<details markdown="1">
<summary>Expected files in database directory</summary>

- `metacache`
- `database/<custom_name>.meta`
- `database/<custom_name>.cache0`
</details>

More information on custom MetaCache database construction can be found [here](https://github.com/muellan/metacache/blob/d7646eca4c4dc131262b16d2910923fce3f5d4fc/docs/building.md).
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,11 @@
"git_sha": "83fe5c85a83aae68a6eb0561e04cbba1e153ac5a",
"installed_by": ["modules"]
},
"metacache/query": {
"branch": "master",
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
"installed_by": ["modules"]
},
"metaphlan/mergemetaphlantables": {
"branch": "master",
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
Expand Down
7 changes: 7 additions & 0 deletions modules/nf-core/metacache/query/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

60 changes: 60 additions & 0 deletions modules/nf-core/metacache/query/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading