nf-core · jfy133 · Mar 23, 2026 · Jul 1, 2025 · Jul 1, 2025 · Jul 1, 2025
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### `Added`
 
+- [#483](https://github.com/nf-core/funcscan/pull/483) Added support for preannotated input with optional GFF column in samplesheet for dbCAN CAZyme Gene Cluster (CGC) and substrate prediction, with new `--dbcan_skip_cgc` and `--dbcan_skip_substrate` parameters (by @HaidYi)
 - [#500](https://github.com/nf-core/funcscan/pull/500) Updated pipeline template to nf-core/tools version 3.4.1 (by @jfy133)
 - [#508](https://github.com/nf-core/funcscan/pull/508) Added support for antiSMASH's --clusterhmmer, --fullhmmer, and --tigrfam options (❤️ to @yusukepockyby for requesting, @jfy133)
 - [#506](https://github.com/nf-core/funcscan/pull/506) Added support GECCO convert for generation of additional files useful for downstream analysis (by @SkyLexS)
@@ -19,6 +20,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 | Tool    | Previous Version | New Version |
 | ------- | ---------------- | ----------- |
+| dbCAN   |                  | 5.2.6       |
 | MultiQC | 1.27             | 1.32        |
 | Bakta   | 1.10.4           | 1.11.4      |
 

@@ -102,6 +102,10 @@
 
   > Alcock, B. P., Huynh, W., Chalil, R., Smith, K. W., Raphenya, A. R., Wlodarski, M. A., Edalatmand, A., Petkau, A., Syed, S. A., Tsang, K. K., Baker, S. J. C., Dave, M., McCarthy, M. C., Mukiri, K. M., Nasir, J. A., Golbon, B., Imtiaz, H., Jiang, X., Kaur, K., Kwong, M., Liang, Z. C., Niu, K. C., Shan, P., Yang, J. Y. J., Gray, K. L., Hoad, G. R., Jia, B., Bhando, T., Carfrae, L. A., Farha, M. A., French, S., Gordzevich, R., Rachwalski, K., Tu, M. M., Bordeleau, E., Dooley, D., Griffiths, E., Zubyk, H. L., Brown, E. D., Maguire, F., Beiko, R. G., Hsiao, W. W. L., Brinkman F. S. L., Van Domselaar, G., McArthur, A. G. (2023). CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic acids research, 51(D1):D690-D699. [DOI: 10.1093/nar/gkac920](https://doi.org/10.1093/nar/gkac920)
 
+- [dbCAN](https://doi.org/10.1093/nar/gkad328)
+
+  > Jinfang Zheng, Qiwei Ge, Yuchen Yan, Xinpeng Zhang, Le Huang, Yanbin Yin, dbCAN3: automated carbohydrate-active enzyme and substrate annotation, Nucleic Acids Research, Volume 51, Issue W1, 5 July 2023, Pages W115–W121. [DOI:10.1093/nar/gkad328](https://doi.org/10.1093/nar/gkad328)
+
 - [SeqKit](https://bioinf.shenwei.me/seqkit/)
 
   > Shen, W., Sipos, B., & Zhao, L. (2024). SeqKit2: A Swiss army knife for sequence and alignment processing. iMeta, e191. [https://doi.org/10.1002/imt2.191](https://doi.org/10.1002/imt2.191)

@@ -40,8 +40,9 @@ The nf-core/funcscan AWS full test dataset are contigs generated by the MGnify s
 5. Screening contigs for antimicrobial peptide-like sequences with [`ampir`](https://cran.r-project.org/web/packages/ampir/index.html), [`Macrel`](https://github.com/BigDataBiology/macrel), [`HMMER`](http://hmmer.org/), [`AMPlify`](https://github.com/bcgsc/AMPlify)
 6. Screening contigs for antibiotic resistant gene-like sequences with [`ABRicate`](https://github.com/tseemann/abricate), [`AMRFinderPlus`](https://github.com/ncbi/amr), [`fARGene`](https://github.com/fannyhb/fargene), [`RGI`](https://card.mcmaster.ca/analyze/rgi), [`DeepARG`](https://bench.cs.vt.edu/deeparg). [`argNorm`](https://github.com/BigDataBiology/argNorm) is used to map the outputs of `DeepARG`, `AMRFinderPlus`, and `ABRicate` to the [`Antibiotic Resistance Ontology`](https://www.ebi.ac.uk/ols4/ontologies/aro) for consistent ARG classification terms.
 7. Screening contigs for biosynthetic gene cluster-like sequences with [`antiSMASH`](https://antismash.secondarymetabolites.org), [`DeepBGC`](https://github.com/Merck/deepbgc), [`GECCO`](https://gecco.embl.de/), [`HMMER`](http://hmmer.org/)
-8. Creating aggregated reports for all samples across the workflows with [`AMPcombi`](https://github.com/Darcy220606/AMPcombi) for AMPs, [`hAMRonization`](https://github.com/pha4ge/hAMRonization) for ARGs, and [`comBGC`](https://raw.githubusercontent.com/nf-core/funcscan/master/bin/comBGC.py) for BGCs
-9. Software version and methods text reporting with [`MultiQC`](http://multiqc.info/)
+8. Screening contigs for carbohydrate-active enzymes (CAZymes), CAZyme gene clusters and substrates with [run_dbcan](https://github.com/bcb-unl/run_dbcan).
+9. Creating aggregated reports for all samples across the workflows with [`AMPcombi`](https://github.com/Darcy220606/AMPcombi) for AMPs, [`hAMRonization`](https://github.com/pha4ge/hAMRonization) for ARGs, and [`comBGC`](https://raw.githubusercontent.com/nf-core/funcscan/master/bin/comBGC.py) for BGCs
+10. Software version and methods text reporting with [`MultiQC`](http://multiqc.info/)
 
 ![funcscan metro workflow](docs/images/funcscan_metro_workflow.png)
 
@@ -92,7 +93,7 @@ nf-core/funcscan was originally written by Jasmin Frangenberg, Anan Ibrahim, Lou
 
 We thank the following people for their extensive assistance in the development of this pipeline:
 
-Adam Talbot, Alexandru Mizeranschi, Hugo Tavares, Júlia Mir Pedrol, Martin Klapper, Mehrdad Jaberi, Robert Syme, Rosa Herbst, Vedanth Ramji, @Microbion, Dediu Octavian-Codrin.
+Adam Talbot, Alexandru Mizeranschi, Haidong Yi, Hugo Tavares, Júlia Mir Pedrol, Martin Klapper, Mehrdad Jaberi, Robert Syme, Rosa Herbst, Vedanth Ramji, @Microbion, Dediu Octavian-Codrin.
 
 ## Contributions and Support
 

@@ -1,4 +1,4 @@
-sample,fasta,protein,gbk
+sample,fasta,protein,gbk,gff
 sample_1,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_1.fasta.gz,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_1.faa,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_1.gbk
 sample_2,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_2.fasta.gz,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_2.faa.gz,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_2.gbk.gz
 sample_3,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs.fasta
@@ -33,12 +33,26 @@
                 "exists": true,
                 "pattern": "^\\S+\\.(gbk|gbff)(\\.gz)?$",
                 "errorMessage": "Input file for feature annotations has incorrect file format. File must end in `.gbk`, `.gbk.gz` or `.gbff`, or `.gbff.gz`"
+            },
+            "gff": {
+                "type": "string",
+                "format": "file-path",
+                "exists": true,
+                "pattern": "^\\S+\\.(gff|gff3)(\\.gz)?$",
+                "errorMessage": "Input file for feature annotations has incorrect file format. File must end in `.gff`, `.gff.gz` or `.gff3`, or `.gff3.gz`"
+            },
+            "gff_type": {
+                "type": "string",
+                "enum": ["NCBI_prok", "prodigal", "NCBI_euk", "JGI"],
+                "errorMessage": "GFF type must be one of: NCBI_prok, prodigal, NCBI_euk, or JGI",
+                "meta": ["gff_type"]
             }
         },
         "required": ["sample", "fasta"],
         "dependentRequired": {
             "protein": ["gbk"],
-            "gbk": ["protein"]
+            "gbk": ["protein"],
+            "gff": ["protein"]
         }
     },
     "uniqueItems": true

@@ -751,4 +751,39 @@ process {
             saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
         ]
     }
+
+    withName: RUNDBCAN_DATABASE {
+        publishDir = [
+            path: { "${params.outdir}/databases/dbcan/" },
+            mode: params.publish_dir_mode,
+            enabled: params.save_db,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
+        ]
+    }
+
+    withName: RUNDBCAN_CAZYMEANNOTATION {
+        publishDir = [
+            path: { "${params.outdir}/cazyme/dbcan/cazyme_annotation/${meta.id}" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
+        ]
+    }
+
+    withName: RUNDBCAN_EASYCGC {
+        publishDir = [
+            path: { "${params.outdir}/cazyme/dbcan/cgc/${meta.id}" },
+            mode: params.publish_dir_mode,
+            pattern: "*_{cgc.gff,cgc_standard_out.tsv,diamond.out.tc,TF_hmm_results.tsv,STP_hmm_results.tsv}",
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
+        ]
+    }
+
+    withName: RUNDBCAN_EASYSUBSTRATE {
+        publishDir = [
+            path: { "${params.outdir}/cazyme/dbcan/substrate/${meta.id}" },
+            mode: params.publish_dir_mode,
+            pattern: "*_{total_cgc_info.tsv,substrate_prediction.tsv,synteny_pdf}",
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
+        ]
+    }
 }
@@ -33,4 +33,6 @@ params {
     run_amp_screening          = true
     amp_run_hmmsearch          = true
     amp_hmmsearch_models       = params.pipelines_testdata_base_path + 'funcscan/hmms/mybacteriocin.hmm'
+
+    run_cazyme_screening       = true
 }
@@ -0,0 +1,34 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Nextflow config file for running minimal tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Defines input files and everything required to run a fast and simple pipeline test.
+
+    Use as follows:
+        nextflow run nf-core/funcscan -profile test_dbcan_pyrodigal,<docker/singularity> --outdir <OUTDIR>
+
+----------------------------------------------------------------------------------------
+*/
+
+process {
+    resourceLimits = [
+        cpus: 4,
+        memory: '15.GB',
+        time: '1.h'
+    ]
+}
+
+params {
+    config_profile_name        = 'CAZyme Pyrodigal test profile'
+    config_profile_description = 'Minimal test dataset to check CAZyme workflow function'
+
+    // Input data
+    input                      = params.pipelines_testdata_base_path + 'funcscan/samplesheet_reduced.csv'
+
+    annotation_tool            = 'pyrodigal'
+
+    run_arg_screening          = false
+    run_amp_screening          = false
+    run_bgc_screening          = false
+    run_cazyme_screening       = true
+}
@@ -0,0 +1,37 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Nextflow config file for running minimal tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Defines input files and everything required to run a fast and simple pipeline test.
+
+    Use as follows:
+        nextflow run nf-core/funcscan -profile test_preannotated_cazyme,<docker/singularity> --outdir <OUTDIR>
+
+----------------------------------------------------------------------------------------
+*/
+
+process {
+    resourceLimits = [
+        cpus: 4,
+        memory: '15.GB',
+        time: '1.h'
+    ]
+}
+
+params {
+    config_profile_name        = 'CAZyme test profile - preannotated input'
+    config_profile_description = 'Minimal test dataset to check CAZyme workflow function'
+
+    // Input data
+    input                      = params.pipelines_testdata_base_path + 'funcscan/samplesheet_preannotated.csv'
+
+    annotation_tool            = 'pyrodigal'
+
+    run_arg_screening          = false
+    run_amp_screening          = false
+    run_bgc_screening          = false
+    run_cazyme_screening       = true
+
+    dbcan_skip_cgc             = false   // CGC annotation enabled as .gff is provided in samplesheet
+    dbcan_skip_substrate       = false   // Substrate annotation enabled as .gff is provided in samplesheet
+}