nf-core · dialvarezs · Jun 25, 2026 · Jun 25, 2026 · Jun 25, 2026 · Jun 25, 2026
@@ -16,13 +16,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - [#1044](https://github.com/nf-core/mag/pull/1044) - Add new `--gtdbtk_place_species` parameter (by @dialvarezs)
 - [#1047](https://github.com/nf-core/mag/issues/1007) - Add `--gtdbtk_single_job` to run GTDB-Tk classification for all bins in a single job (requested by @sarah-shah-bioinf, by @dialvarezs)
 - [#1048](https://github.com/nf-core/mag/pull/1048) - Add optional PyPOLCA polishing for long-read assemblies via `--run_pypolca` (by @Harshita-sriv)
+- [#1059](https://github.com/nf-core/mag/pull/1059) - Add `--longread_filtering_by_shortreads` parameter to enable filtlong's short-read-based long read filtering (by @dialvarezs)
 
 ### `Changed`
 
 - [#1011](https://github.com/nf-core/mag/pull/1011) - Reverted CheckM2 database download workaround from #966 (by @dialvarezs)
 - [#1020](https://github.com/nf-core/mag/pull/1020) - Update CONCOCT subworkflow and modules (by @dialvarezs)
 - [#1030](https://github.com/nf-core/mag/pull/1030) - Updated to nf-core 4.0.2 template (by @dialvarezs)
 - [#1044](https://github.com/nf-core/mag/pull/1044) - Updated GTDB-Tk to v2.7.2 / GTDB r232 (by @dialvarezs)
+- [#1059](https://github.com/nf-core/mag/pull/1059) - Changed the default long read filtering tool from `filtlong` to `chopper` (by @dialvarezs)
 - [#1060](https://github.com/nf-core/mag/pull/1060) - Updated module tags to make them more specific (by @dialvarezs)
 
 ### `Fixed`

@@ -29,8 +29,7 @@ params {
     busco_db                      = params.pipelines_testdata_base_path + 'mag/databases/busco/bacteria_odb10.2024-01-08.tar.gz'
     busco_db_lineage              = 'bacteria_odb10'
 
-    longread_adaptertrimming_tool = 'porechop'
-    longread_filtering_tool       = 'chopper'
+    longread_filtering_tool       = 'filtlong'
 
     skip_spades                   = true
     skip_megahit                  = true

@@ -21,6 +21,9 @@ process {
 
 // needed for result stability
 process {
+    withName: CHOPPER {
+        cpus = 1
+    }
     withName: METAMDBG_ASM {
         cpus = 1
     }

@@ -138,7 +138,7 @@ The pipeline uses porechop_abi or porechop to perform adapter trimming of the lo
 
 ### Long read filtering
 
-The pipeline uses filtlong, chopper, or nanoq for quality filtering of long reads, specified with `--longread_filtering_tool <filtlong|chopper|nanoq>`. Only filtlong is capable of filtering long reads against short reads, and is therefore currently recommended in the hybrid mode. If chopper is selected as long read filtering tool, Lambda Phage removal will be performed with chopper as well, instead of nanolyse.
+The pipeline uses chopper, filtlong, or nanoq for quality filtering of long reads, specified with `--longread_filtering_tool <chopper|filtlong|nanoq>`. The default is chopper, which performs length/quality filtering and (unless `--keep_lambda` is set) Lambda Phage removal in a single step. Filtlong can filter long reads against short reads (opt-in via `--longread_filtering_by_shortreads`, see the note below). The `--longreads_keep_percent` and `--longreads_length_weight` parameters only apply to filtlong.
 
 <details markdown="1">
 <summary>Output files</summary>
@@ -148,14 +148,14 @@ The pipeline uses filtlong, chopper, or nanoq for quality filtering of long read
 - `QC_longreads/Nanoq/`
   - `[sample]_[run]_nanoq_filtered.fastq.gz`: The length and quality filtered reads in FASTQ from Nanoq
 - `QC_longreads/Chopper/`
-  - `[sample]_[run]_nanoq_chopper.fastq.gz`: The length and quality filtered, optionally phage lambda removed reads in FASTQ from Chopper
+  - `[sample]_[run]_chopper.fastq.gz`: The length and quality filtered, optionally phage lambda removed reads in FASTQ from Chopper
 
 </details>
 
 Trimmed and filtered FASTQ output directories and files will only exist if `--save_porechop_reads` and/or `--save_filtered_longreads` (respectively) are provided to the run command.
 
 No direct host read removal is performed for long reads.
-However, since within this pipeline filtlong uses a read quality based on k-mer matches to the already filtered short reads, reads not overlapping those short reads might be discarded. Note that this only applies when using filtlong as long read filtering tool.
+When filtlong is run with `--longread_filtering_by_shortreads` and short reads are supplied, it derives read quality from k-mer matches to the already filtered short reads instead of the Phred scores. Reads not overlapping those short reads might then be discarded, which can cause data loss if short read coverage is poor or the long- and short-read metagenomes differ.
 The lower the parameter `--longreads_length_weight`, the higher the impact of the read qualities for filtering.
 For further documentation see the [filtlong online documentation](https://github.com/rrwick/Filtlong).
 

@@ -442,6 +442,12 @@ When using mapping modes of `group` or `all`, different BAM files may be used fo
 This may result in a different set or none of contigs being evaluated in pyDamage compared to the final bin.
 :::
 
+## A note on long read filtering
+
+Long reads are quality/length filtered with the tool set by `--longread_filtering_tool`. The default is `chopper`, which also removes Lambda Phage reads (unless `--keep_lambda` is set). Alternatives are `filtlong` and `nanoq`, which run in combination with NanoLyse to remove Lambda Phage reads.
+
+`filtlong` can filter long reads against the short reads of the same sample. This is disabled by default and enabled with `--longread_filtering_by_shortreads`. When enabled, filtlong derives long read quality from k-mer matches to the (already filtered) short reads instead of the Phred scores. This can improve filtering when short read coverage is good and both libraries come from the same metagenome, but it can also discard long reads not overlapping the short reads, causing data loss when short read coverage is poor or the long- and short-read metagenomes differ. The `--longreads_keep_percent` and `--longreads_length_weight` parameters only apply to filtlong.
+
 ## A note on coverage estimation
 
 In order to run the binning tools included in the pipeline, MAG must first align reads back to the assemblies, and estimate the coverage of each contig.

@@ -33,7 +33,8 @@ params {
 
     // long read preprocessing options
     longread_adaptertrimming_tool        = "porechop_abi"
-    longread_filtering_tool              = "filtlong"
+    longread_filtering_tool              = "chopper"
+    longread_filtering_by_shortreads     = false
     // phix_reference                    = "ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/Enterobacteria_phage_phiX174_sensu_lato/all_assembly_versions/GCA_002596845.1_ASM259684v1/GCA_002596845.1_ASM259684v1_genomic.fna.gz"
     phix_reference                       = null
     save_phixremoved_reads               = false

@@ -445,12 +445,12 @@
                 "longreads_keep_percent": {
                     "type": "integer",
                     "default": 90,
-                    "description": "Keep this percent of bases."
+                    "description": "Keep this percent of bases. Only used by filtlong."
                 },
                 "longreads_length_weight": {
                     "type": "integer",
                     "default": 10,
-                    "description": "The higher the more important is read length when choosing the best reads.",
+                    "description": "The higher the more important is read length when choosing the best reads. Only used by filtlong.",
                     "help_text": "The default value focuses on length instead of quality to improve assembly size.\nIn order to assign equal weights to read lengths and read qualities set this parameter to 1.\nThis might be useful, for example, to benefit indirectly from the removal of short host reads (causing lower qualities for reads not overlapping filtered short reads)."
                 },
                 "keep_lambda": {
@@ -485,8 +485,13 @@
                 "longread_filtering_tool": {
                     "type": "string",
                     "description": "Specify which long read filtering tool to use.",
-                    "enum": ["filtlong", "nanoq", "chopper"],
-                    "default": "filtlong"
+                    "enum": ["chopper", "filtlong", "nanoq"],
+                    "default": "chopper"
+                },
+                "longread_filtering_by_shortreads": {
+                    "type": "boolean",
+                    "description": "Filter long reads against short reads when using filtlong.",
+                    "help_text": "Only applies when `--longread_filtering_tool filtlong` and short reads are supplied. When enabled, filtlong derives long read quality from k-mer matches to the short reads instead of the Phred scores. This can discard long reads not overlapping the short reads, leading to data loss if short read coverage is poor or the metagenomes differ. Disabled by default."
                 }
             }
         },

@@ -75,15 +75,21 @@ workflow LONGREAD_PREPROCESSING {
         }
         if (!params.skip_longread_filtering && !val_skip_qc) {
             if (params.longread_filtering_tool == 'filtlong') {
-                // join long and short reads by sample name
-                ch_short_reads_tmp = ch_short_reads.map { meta, sr -> [meta.id, sr] }
-
-                ch_short_and_long_reads = ch_long_reads
-                    .map { meta, lr -> [meta.id, meta, lr] }
-                    .join(ch_short_reads_tmp, by: 0, remainder: true)
-                    .filter { row -> row[1] != null }  // filter out samples with no long reads
-                    .map { _id, meta_lr, lr, sr -> [meta_lr, sr ? sr : [], lr] }
-                // should not occur for single-end, since SPAdes (hybrid) does not support single-end
+                if (params.longread_filtering_by_shortreads) {
+                    // join long and short reads by sample name
+                    ch_short_reads_tmp = ch_short_reads.map { meta, sr -> [meta.id, sr] }
+
+                    ch_short_and_long_reads = ch_long_reads
+                        .map { meta, lr -> [meta.id, meta, lr] }
+                        .join(ch_short_reads_tmp, by: 0, remainder: true)
+                        .filter { row -> row[1] != null }  // filter out samples with no long reads
+                        .map { _id, meta_lr, lr, sr -> [meta_lr, sr ? sr : [], lr] }
+                    // should not occur for single-end, since SPAdes (hybrid) does not support single-end
+                }
+                else {
+                    // filter long reads on their own Phred scores, ignoring short reads
+                    ch_short_and_long_reads = ch_long_reads.map { meta, lr -> [meta, [], lr] }
+                }
 
                 FILTLONG(
                     ch_short_and_long_reads

@@ -1,7 +1,7 @@
 {
     "-profile hybrid": {
         "content": [
-            59,
+            60,
             {
                 "ADJUST_MAXBIN2_EXT": {
                     "coreutils": 9.5
@@ -36,9 +36,6 @@
                 "BUSCO_UNTAR": {
                     "untar": 1.34
                 },
-                "CHOPPER": {
-                    "chopper": "0.8.0"
-                },
                 "CONCAT_BUSCO_TSV": {
                     "qsv": "5.1.0"
                 },
@@ -57,6 +54,9 @@
                 "FASTQC_TRIMMED": {
                     "fastqc": "0.12.1"
                 },
+                "FILTLONG": {
+                    "filtlong": "0.2.1"
+                },
                 "GUNZIP_BINS": {
                     "gunzip": 1.13
                 },
@@ -89,6 +89,9 @@
                 "MINIMAP2_HOST_INDEX": {
                     "minimap2": "2.29-r1283"
                 },
+                "NANOLYSE": {
+                    "nanolyse": "1.2.0"
+                },
                 "NANOPLOT_FILTERED": {
                     "nanoplot": "1.46.1"
                 },
@@ -98,8 +101,8 @@
                 "POOL_LONG_READS": {
                     "cat": 9.5
                 },
-                "PORECHOP_PORECHOP": {
-                    "porechop": "0.2.4"
+                "PORECHOP_ABI": {
+                    "porechop_abi": "0.5.0"
                 },
                 "PRODIGAL": {
                     "pigz": 2.6,
@@ -141,10 +144,10 @@
                 }
             }
         ],
-        "timestamp": "2026-04-01T14:55:50.11826155",
+        "timestamp": "2026-06-25T13:22:30.581877242",
         "meta": {
             "nf-test": "0.9.5",
-            "nextflow": "25.10.4"
+            "nextflow": "26.04.4"
         }
     },
     "multiqc": {
@@ -337,7 +340,7 @@
                 "multiqc/multiqc_report.html"
             ],
             [
-                "bowtie2_pe_plot.yaml:md5,331d49d9aa0a0ce1a3a8b407b8a49801",
+                "bowtie2_pe_plot.yaml:md5,522bada95de93b46cbc9a03177bd12d4",
                 "busco_plot_bacteria_odb10.yaml:md5,da10d6a882b6f7a5e6eca94f415aabff",
                 "fastp-insert-size-plot.yaml:md5,256fda12474da96da3bae1662c3dda5b",
                 "fastp-seq-content-gc-plot_Read_1_After_filtering.yaml:md5,e98b74aa3104dba1a964f042b20f88b2",
@@ -372,26 +375,26 @@
                 "fastqc_sequence_length_distribution_plot.yaml:md5,054f8574b204516f6e6739691a117612",
                 "multiqc_bowtie2.yaml:md5,d6a4f1f2e6952801eda62c414c1685cb",
                 "multiqc_bowtie2_bowtie2-1.yaml:md5,d6a4f1f2e6952801eda62c414c1685cb",
-                "multiqc_bowtie2_bowtie2-2.yaml:md5,b5a185109453a80131d7da2cf4d3f2ec",
+                "multiqc_bowtie2_bowtie2-2.yaml:md5,c4ba5e1efc5c4a535abf8a07ac74cfe0",
                 "multiqc_busco.yaml:md5,df428a34eda0c1a05aac5cd84a040e08",
                 "multiqc_citations.yaml:md5,49f79471c1620b06da08b1acb97a1e56",
                 "multiqc_fastp.yaml:md5,29b2500f793b0c444ee7c6535d406b09",
                 "multiqc_fastqc.yaml:md5,0d11a6d30d0d1a430e58d44b57ac6e87",
                 "multiqc_fastqc_fastqc-1.yaml:md5,0b8c8b4edb4f3e70409f30bcdda54397",
-                "multiqc_general_stats.yaml:md5,edb4f404e32d372bd387a90b72575087",
-                "multiqc_prokka.yaml:md5,b994dcc420bd2f85d4e0f91549e08b54",
-                "multiqc_quast.yaml:md5,e5940a0ce6af05da51fe94bc6d7d3901",
-                "multiqc_quast_quast-1.yaml:md5,b20f16ccd1d9395ecac0ceee2cbe78de",
+                "multiqc_general_stats.yaml:md5,55fb08efe41b9034631876de1a0a1408",
+                "multiqc_prokka.yaml:md5,acf8d5a16ca002cb4947275156d4d664",
+                "multiqc_quast.yaml:md5,38504dc5c2b2a4dca642eaebd509c3c3",
+                "multiqc_quast_quast-1.yaml:md5,ca453bb9cde3ad3d7055ab4682eb5f05",
                 "porechop.yaml:md5,828d96e2822bb4b4108ab706c00c4450",
-                "prokka_plot.yaml:md5,c095a60a9936134ec73e6667a5635bbe",
-                "quast_num_contigs.yaml:md5,ddc54b8fb069f419547237c52e3058f2",
-                "quast_table.yaml:md5,e5720e81c420f07480481ac3c75bb822"
+                "prokka_plot.yaml:md5,00e1273aa90bce2c443da13fed6fb954",
+                "quast_num_contigs.yaml:md5,c91a25b6638a2805f1fb36bbf008e6df",
+                "quast_table.yaml:md5,5893c2a49a9093b3d7555a56560a0842"
             ]
         ],
-        "timestamp": "2026-05-04T10:28:55.649537639",
+        "timestamp": "2026-06-25T13:22:30.618963624",
         "meta": {
             "nf-test": "0.9.5",
-            "nextflow": "26.04.0"
+            "nextflow": "26.04.4"
         }
     }
 }