Skip to content
Merged
Show file tree
Hide file tree
Changes from 78 commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
6353679
Add run_dbcan screening
HaidYi Jul 1, 2025
15f2ef5
fix missing gffs
HaidYi Jul 1, 2025
d5df4a1
split dbcan results by meta.id
HaidYi Jul 1, 2025
f049e2f
rm constraints of annotation tool
HaidYi Jul 1, 2025
8289bdb
add test config for rundbcan
HaidYi Jul 1, 2025
d8af5e9
add test profile for rundbcan in ci
HaidYi Jul 1, 2025
0a5e505
add dbcan in the refs
HaidYi Jul 2, 2025
01a573a
Suggestions from code review
Jul 10, 2025
5c5ec66
rm duplicate outputs
HaidYi Jul 15, 2025
9fd005c
add manual dbCAN database download
HaidYi Jul 15, 2025
ea4b852
rename DBCAN to CAZYME
HaidYi Jul 15, 2025
62623a5
add gff column in samplesheet
HaidYi Jul 15, 2025
0cad8f9
change run_dbcan_screening to run_cazyme_screening
HaidYi Jul 15, 2025
b76e3a2
add missing identifier
HaidYi Jul 17, 2025
0f5863a
add missing identifier
HaidYi Jul 17, 2025
f2d79d5
add missing conda
HaidYi Jul 17, 2025
625ced4
fix typo
HaidYi Jul 17, 2025
58273f1
re-organize the outdir structure of cazyme screening
HaidYi Jul 17, 2025
a638f32
add citation
HaidYi Jul 26, 2025
a5d692b
add cazyme_skip_dbcan param
HaidYi Jul 26, 2025
da9d4a4
fix missing ','
HaidYi Jul 26, 2025
9f3af6c
add gff type parameter for dbcan
HaidYi Aug 25, 2025
15645fb
mv hard-coded gff type to params
HaidYi Aug 25, 2025
448145b
Merge remote-tracking branch 'origin/dev' into rundbcan
jfy133 Aug 27, 2025
8e48936
fix typo
HaidYi Aug 27, 2025
101a159
fix format
HaidYi Aug 27, 2025
9c14e24
fix lint issue
HaidYi Aug 27, 2025
0899e10
Merge branch 'dev' into rundbcan
HaidYi Aug 27, 2025
63c8b04
Fix snapshot
jfy133 Aug 28, 2025
03b1030
Fix RO crate
jfy133 Aug 28, 2025
cce04b2
only list top view
HaidYi Sep 18, 2025
ddd51c1
Update docs/output.md
HaidYi Sep 18, 2025
b31feb6
Update docs/output.md
HaidYi Sep 18, 2025
36c22d3
Update docs/output.md
HaidYi Sep 18, 2025
59385f9
Update docs/output.md
HaidYi Sep 18, 2025
2a6544e
Update docs/output.md
HaidYi Sep 18, 2025
2dbe952
Update docs/output.md
HaidYi Sep 18, 2025
01fb374
add a column: gff_type in samplesheet
HaidYi Sep 23, 2025
13b82ab
rm dbcan_gff_type parameter
HaidYi Sep 23, 2025
3af937f
add option for using local dbcan db
HaidYi Sep 23, 2025
c28f049
filter samples for dbcan cgc/substrate if no gff_type provided in sam…
HaidYi Sep 23, 2025
796b96d
add cazyme to toolCitationText
HaidYi Sep 25, 2025
6de2005
Update docs/output.md
HaidYi Sep 25, 2025
5f9b432
update the profile name
HaidYi Sep 27, 2025
e114734
add cazyme_screening to default test
HaidYi Sep 28, 2025
6ee4dd7
add test_cazyme_pyrodigal test
HaidYi Sep 28, 2025
18ba885
add cazyme_dbcan_db to params
HaidYi Sep 28, 2025
161d37d
fix bug
HaidYi Sep 28, 2025
d505ea6
add gff_type in meta for cazyme screening
HaidYi Sep 28, 2025
69d5133
Merge branch 'dev' into rundbcan
jfy133 Oct 8, 2025
f5ed73e
[automated] Fix code linting
nf-core-bot Oct 8, 2025
c3c09c9
Merge branch 'dev' into rundbcan
jfy133 Nov 5, 2025
0ef0ade
rm `set` and use =
HaidYi Nov 24, 2025
fdb238a
Update subworkflows/local/cazyme.nf
HaidYi Nov 24, 2025
6af2309
Update workflows/funcscan.nf
HaidYi Nov 24, 2025
e4a4784
Update nextflow.config
HaidYi Nov 24, 2025
41e9466
Update nextflow.config
HaidYi Nov 24, 2025
6cb3a41
update the name
HaidYi Nov 24, 2025
99c0c19
add the icon
HaidYi Nov 24, 2025
1132b7e
Merge branch 'dev' into rundbcan
jfy133 Jan 7, 2026
07a7d71
update the rundbcan/database
HaidYi Feb 3, 2026
497d6d1
recompute hash
HaidYi Feb 3, 2026
bab0e18
update rundbcan modules
HaidYi Feb 3, 2026
c0c76dc
update the default test snap
HaidYi Feb 3, 2026
7a921b4
remove one assertion for the upgraded package
HaidYi Feb 3, 2026
d2e8e94
update the test cazyme pyrodigal snap file
HaidYi Feb 3, 2026
4b5a342
Merge branch 'dev' into rundbcan
jfy133 Feb 4, 2026
737db6c
Apply suggestions from code review
jfy133 Feb 20, 2026
3b9bbab
Apply suggestions from code review
jfy133 Feb 20, 2026
b6c0aab
fix comments
HaidYi Mar 8, 2026
3c9c118
update the doc/usage
HaidYi Mar 8, 2026
49ef6d9
update contributor
HaidYi Mar 8, 2026
0618e35
add help_texts
HaidYi Mar 8, 2026
8be0c33
add to contributor
HaidYi Mar 8, 2026
ebe8bc4
update changelog
HaidYi Mar 8, 2026
80f7f1e
add more cazyme tests
HaidYi Mar 8, 2026
b54a6e7
[automated] Fix code linting
nf-core-bot Mar 14, 2026
97c9bb9
Apply suggestion from @jfy133
jfy133 Mar 20, 2026
8103a8a
Apply suggestions from code review
jfy133 Mar 20, 2026
419e389
Restructure tests
jfy133 Mar 20, 2026
35f9c4b
Merge branch 'dev' into rundbcan
jfy133 Mar 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [#483](https://github.com/nf-core/funcscan/pull/483) Added support for preannotated input with optional GFF column in samplesheet for dbCAN CAZyme Gene Cluster (CGC) and substrate prediction, with new `--dbcan_skip_cgc` and `--dbcan_skip_substrate` parameters (by @HaidYi)
- [#500](https://github.com/nf-core/funcscan/pull/500) Updated pipeline template to nf-core/tools version 3.4.1 (by @jfy133)
- [#508](https://github.com/nf-core/funcscan/pull/508) Added support for antiSMASH's --clusterhmmer, --fullhmmer, and --tigrfam options (❤️ to @yusukepockyby for requesting, @jfy133)
- [#506](https://github.com/nf-core/funcscan/pull/506) Added support GECCO convert for generation of additional files useful for downstream analysis (by @SkyLexS)
Expand All @@ -19,6 +20,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

| Tool | Previous Version | New Version |
| ------- | ---------------- | ----------- |
| dbCAN | | 5.2.6 |
| MultiQC | 1.27 | 1.32 |
| Bakta | 1.10.4 | 1.11.4 |

Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,10 @@

> Alcock, B. P., Huynh, W., Chalil, R., Smith, K. W., Raphenya, A. R., Wlodarski, M. A., Edalatmand, A., Petkau, A., Syed, S. A., Tsang, K. K., Baker, S. J. C., Dave, M., McCarthy, M. C., Mukiri, K. M., Nasir, J. A., Golbon, B., Imtiaz, H., Jiang, X., Kaur, K., Kwong, M., Liang, Z. C., Niu, K. C., Shan, P., Yang, J. Y. J., Gray, K. L., Hoad, G. R., Jia, B., Bhando, T., Carfrae, L. A., Farha, M. A., French, S., Gordzevich, R., Rachwalski, K., Tu, M. M., Bordeleau, E., Dooley, D., Griffiths, E., Zubyk, H. L., Brown, E. D., Maguire, F., Beiko, R. G., Hsiao, W. W. L., Brinkman F. S. L., Van Domselaar, G., McArthur, A. G. (2023). CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic acids research, 51(D1):D690-D699. [DOI: 10.1093/nar/gkac920](https://doi.org/10.1093/nar/gkac920)

- [dbCAN](https://doi.org/10.1093/nar/gkad328)

> Jinfang Zheng, Qiwei Ge, Yuchen Yan, Xinpeng Zhang, Le Huang, Yanbin Yin, dbCAN3: automated carbohydrate-active enzyme and substrate annotation, Nucleic Acids Research, Volume 51, Issue W1, 5 July 2023, Pages W115–W121. [DOI:10.1093/nar/gkad328](https://doi.org/10.1093/nar/gkad328)

- [SeqKit](https://bioinf.shenwei.me/seqkit/)

> Shen, W., Sipos, B., & Zhao, L. (2024). SeqKit2: A Swiss army knife for sequence and alignment processing. iMeta, e191. [https://doi.org/10.1002/imt2.191](https://doi.org/10.1002/imt2.191)
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,9 @@ The nf-core/funcscan AWS full test dataset are contigs generated by the MGnify s
5. Screening contigs for antimicrobial peptide-like sequences with [`ampir`](https://cran.r-project.org/web/packages/ampir/index.html), [`Macrel`](https://github.com/BigDataBiology/macrel), [`HMMER`](http://hmmer.org/), [`AMPlify`](https://github.com/bcgsc/AMPlify)
6. Screening contigs for antibiotic resistant gene-like sequences with [`ABRicate`](https://github.com/tseemann/abricate), [`AMRFinderPlus`](https://github.com/ncbi/amr), [`fARGene`](https://github.com/fannyhb/fargene), [`RGI`](https://card.mcmaster.ca/analyze/rgi), [`DeepARG`](https://bench.cs.vt.edu/deeparg). [`argNorm`](https://github.com/BigDataBiology/argNorm) is used to map the outputs of `DeepARG`, `AMRFinderPlus`, and `ABRicate` to the [`Antibiotic Resistance Ontology`](https://www.ebi.ac.uk/ols4/ontologies/aro) for consistent ARG classification terms.
7. Screening contigs for biosynthetic gene cluster-like sequences with [`antiSMASH`](https://antismash.secondarymetabolites.org), [`DeepBGC`](https://github.com/Merck/deepbgc), [`GECCO`](https://gecco.embl.de/), [`HMMER`](http://hmmer.org/)
8. Creating aggregated reports for all samples across the workflows with [`AMPcombi`](https://github.com/Darcy220606/AMPcombi) for AMPs, [`hAMRonization`](https://github.com/pha4ge/hAMRonization) for ARGs, and [`comBGC`](https://raw.githubusercontent.com/nf-core/funcscan/master/bin/comBGC.py) for BGCs
9. Software version and methods text reporting with [`MultiQC`](http://multiqc.info/)
8. Screening contigs for carbohydrate-active enzymes (CAZymes), CAZyme gene clusters and substrates with [run_dbcan](https://github.com/bcb-unl/run_dbcan).
9. Creating aggregated reports for all samples across the workflows with [`AMPcombi`](https://github.com/Darcy220606/AMPcombi) for AMPs, [`hAMRonization`](https://github.com/pha4ge/hAMRonization) for ARGs, and [`comBGC`](https://raw.githubusercontent.com/nf-core/funcscan/master/bin/comBGC.py) for BGCs
10. Software version and methods text reporting with [`MultiQC`](http://multiqc.info/)

![funcscan metro workflow](docs/images/funcscan_metro_workflow.png)

Expand Down Expand Up @@ -92,7 +93,7 @@ nf-core/funcscan was originally written by Jasmin Frangenberg, Anan Ibrahim, Lou

We thank the following people for their extensive assistance in the development of this pipeline:

Adam Talbot, Alexandru Mizeranschi, Hugo Tavares, Júlia Mir Pedrol, Martin Klapper, Mehrdad Jaberi, Robert Syme, Rosa Herbst, Vedanth Ramji, @Microbion, Dediu Octavian-Codrin.
Adam Talbot, Alexandru Mizeranschi, Haidong Yi, Hugo Tavares, Júlia Mir Pedrol, Martin Klapper, Mehrdad Jaberi, Robert Syme, Rosa Herbst, Vedanth Ramji, @Microbion, Dediu Octavian-Codrin.

## Contributions and Support

Expand Down
2 changes: 1 addition & 1 deletion assets/samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sample,fasta,protein,gbk
sample,fasta,protein,gbk,gff
sample_1,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_1.fasta.gz,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_1.faa,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_1.gbk
sample_2,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_2.fasta.gz,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_2.faa.gz,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_2.gbk.gz
sample_3,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs.fasta
16 changes: 15 additions & 1 deletion assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,26 @@
"exists": true,
"pattern": "^\\S+\\.(gbk|gbff)(\\.gz)?$",
"errorMessage": "Input file for feature annotations has incorrect file format. File must end in `.gbk`, `.gbk.gz` or `.gbff`, or `.gbff.gz`"
},
"gff": {
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.(gff|gff3)(\\.gz)?$",
"errorMessage": "Input file for feature annotations has incorrect file format. File must end in `.gff`, `.gff.gz` or `.gff3`, or `.gff3.gz`"
},
"gff_type": {
"type": "string",
"enum": ["NCBI_prok", "prodigal", "NCBI_euk", "JGI"],
"errorMessage": "GFF type must be one of: NCBI_prok, prodigal, NCBI_euk, or JGI",
"meta": ["gff_type"]
}
},
"required": ["sample", "fasta"],
"dependentRequired": {
"protein": ["gbk"],
"gbk": ["protein"]
"gbk": ["protein"],
"gff": ["protein"]
}
},
"uniqueItems": true
Expand Down
35 changes: 35 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -751,4 +751,39 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: RUNDBCAN_DATABASE {
publishDir = [
path: { "${params.outdir}/databases/dbcan/" },
mode: params.publish_dir_mode,
enabled: params.save_db,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: RUNDBCAN_CAZYMEANNOTATION {
publishDir = [
path: { "${params.outdir}/cazyme/dbcan/cazyme_annotation/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: RUNDBCAN_EASYCGC {
publishDir = [
path: { "${params.outdir}/cazyme/dbcan/cgc/${meta.id}" },
mode: params.publish_dir_mode,
pattern: "*_{cgc.gff,cgc_standard_out.tsv,diamond.out.tc,TF_hmm_results.tsv,STP_hmm_results.tsv}",
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: RUNDBCAN_EASYSUBSTRATE {
publishDir = [
path: { "${params.outdir}/cazyme/dbcan/substrate/${meta.id}" },
mode: params.publish_dir_mode,
pattern: "*_{total_cgc_info.tsv,substrate_prediction.tsv,synteny_pdf}",
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}
}
2 changes: 2 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,6 @@ params {
run_amp_screening = true
amp_run_hmmsearch = true
amp_hmmsearch_models = params.pipelines_testdata_base_path + 'funcscan/hmms/mybacteriocin.hmm'

run_cazyme_screening = true
}
34 changes: 34 additions & 0 deletions conf/test_cazyme_pyrodigal.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/funcscan -profile test_dbcan_pyrodigal,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/

process {
resourceLimits = [
cpus: 4,
memory: '15.GB',
time: '1.h'
]
}

params {
config_profile_name = 'CAZyme Pyrodigal test profile'
config_profile_description = 'Minimal test dataset to check CAZyme workflow function'

// Input data
input = params.pipelines_testdata_base_path + 'funcscan/samplesheet_reduced.csv'

annotation_tool = 'pyrodigal'

run_arg_screening = false
run_amp_screening = false
run_bgc_screening = false
run_cazyme_screening = true
}
37 changes: 37 additions & 0 deletions conf/test_preannotated_cazyme.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/funcscan -profile test_preannotated_cazyme,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/

process {
resourceLimits = [
cpus: 4,
memory: '15.GB',
time: '1.h'
]
}

params {
config_profile_name = 'CAZyme test profile - preannotated input'
config_profile_description = 'Minimal test dataset to check CAZyme workflow function'

// Input data
input = params.pipelines_testdata_base_path + 'funcscan/samplesheet_preannotated.csv'

annotation_tool = 'pyrodigal'

run_arg_screening = false
run_amp_screening = false
run_bgc_screening = false
run_cazyme_screening = true

dbcan_skip_cgc = false // CGC annotation enabled as .gff is provided in samplesheet
dbcan_skip_substrate = false // Substrate annotation enabled as .gff is provided in samplesheet
}
Loading
Loading