Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#1044](https://github.com/nf-core/mag/pull/1044) - Add new `--gtdbtk_place_species` parameter (by @dialvarezs)
- [#1047](https://github.com/nf-core/mag/issues/1007) - Add `--gtdbtk_single_job` to run GTDB-Tk classification for all bins in a single job (requested by @sarah-shah-bioinf, by @dialvarezs)
- [#1048](https://github.com/nf-core/mag/pull/1048) - Add optional PyPOLCA polishing for long-read assemblies via `--run_pypolca` (by @Harshita-sriv)
- [#1059](https://github.com/nf-core/mag/pull/1059) - Add `--longread_filtering_by_shortreads` parameter to enable filtlong's short-read-based long read filtering (by @dialvarezs)

### `Changed`

- [#1011](https://github.com/nf-core/mag/pull/1011) - Reverted CheckM2 database download workaround from #966 (by @dialvarezs)
- [#1020](https://github.com/nf-core/mag/pull/1020) - Update CONCOCT subworkflow and modules (by @dialvarezs)
- [#1030](https://github.com/nf-core/mag/pull/1030) - Updated to nf-core 4.0.2 template (by @dialvarezs)
- [#1044](https://github.com/nf-core/mag/pull/1044) - Updated GTDB-Tk to v2.7.2 / GTDB r232 (by @dialvarezs)
- [#1059](https://github.com/nf-core/mag/pull/1059) - Changed the default long read filtering tool from `filtlong` to `chopper` (by @dialvarezs)
- [#1060](https://github.com/nf-core/mag/pull/1060) - Updated module tags to make them more specific (by @dialvarezs)

### `Fixed`
Expand Down
3 changes: 1 addition & 2 deletions conf/test_hybrid.config
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,7 @@ params {
busco_db = params.pipelines_testdata_base_path + 'mag/databases/busco/bacteria_odb10.2024-01-08.tar.gz'
busco_db_lineage = 'bacteria_odb10'

longread_adaptertrimming_tool = 'porechop'
longread_filtering_tool = 'chopper'
longread_filtering_tool = 'filtlong'

skip_spades = true
skip_megahit = true
Expand Down
3 changes: 3 additions & 0 deletions conf/test_longreadonly.config
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ process {

// needed for result stability
process {
withName: CHOPPER {
cpus = 1
}
withName: METAMDBG_ASM {
cpus = 1
}
Expand Down
6 changes: 3 additions & 3 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ The pipeline uses porechop_abi or porechop to perform adapter trimming of the lo

### Long read filtering

The pipeline uses filtlong, chopper, or nanoq for quality filtering of long reads, specified with `--longread_filtering_tool <filtlong|chopper|nanoq>`. Only filtlong is capable of filtering long reads against short reads, and is therefore currently recommended in the hybrid mode. If chopper is selected as long read filtering tool, Lambda Phage removal will be performed with chopper as well, instead of nanolyse.
The pipeline uses chopper, filtlong, or nanoq for quality filtering of long reads, specified with `--longread_filtering_tool <chopper|filtlong|nanoq>`. The default is chopper, which performs length/quality filtering and (unless `--keep_lambda` is set) Lambda Phage removal in a single step. Filtlong can filter long reads against short reads (opt-in via `--longread_filtering_by_shortreads`, see the note below). The `--longreads_keep_percent` and `--longreads_length_weight` parameters only apply to filtlong.

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -148,14 +148,14 @@ The pipeline uses filtlong, chopper, or nanoq for quality filtering of long read
- `QC_longreads/Nanoq/`
- `[sample]_[run]_nanoq_filtered.fastq.gz`: The length and quality filtered reads in FASTQ from Nanoq
- `QC_longreads/Chopper/`
- `[sample]_[run]_nanoq_chopper.fastq.gz`: The length and quality filtered, optionally phage lambda removed reads in FASTQ from Chopper
- `[sample]_[run]_chopper.fastq.gz`: The length and quality filtered, optionally phage lambda removed reads in FASTQ from Chopper

</details>

Trimmed and filtered FASTQ output directories and files will only exist if `--save_porechop_reads` and/or `--save_filtered_longreads` (respectively) are provided to the run command.

No direct host read removal is performed for long reads.
However, since within this pipeline filtlong uses a read quality based on k-mer matches to the already filtered short reads, reads not overlapping those short reads might be discarded. Note that this only applies when using filtlong as long read filtering tool.
When filtlong is run with `--longread_filtering_by_shortreads` and short reads are supplied, it derives read quality from k-mer matches to the already filtered short reads instead of the Phred scores. Reads not overlapping those short reads might then be discarded, which can cause data loss if short read coverage is poor or the long- and short-read metagenomes differ.
The lower the parameter `--longreads_length_weight`, the higher the impact of the read qualities for filtering.
For further documentation see the [filtlong online documentation](https://github.com/rrwick/Filtlong).

Expand Down
6 changes: 6 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -442,6 +442,12 @@ When using mapping modes of `group` or `all`, different BAM files may be used fo
This may result in a different set or none of contigs being evaluated in pyDamage compared to the final bin.
:::

## A note on long read filtering

Long reads are quality/length filtered with the tool set by `--longread_filtering_tool`. The default is `chopper`, which also removes Lambda Phage reads (unless `--keep_lambda` is set). Alternatives are `filtlong` and `nanoq`, which run in combination with NanoLyse to remove Lambda Phage reads.

`filtlong` can filter long reads against the short reads of the same sample. This is disabled by default and enabled with `--longread_filtering_by_shortreads`. When enabled, filtlong derives long read quality from k-mer matches to the (already filtered) short reads instead of the Phred scores. This can improve filtering when short read coverage is good and both libraries come from the same metagenome, but it can also discard long reads not overlapping the short reads, causing data loss when short read coverage is poor or the long- and short-read metagenomes differ. The `--longreads_keep_percent` and `--longreads_length_weight` parameters only apply to filtlong.

## A note on coverage estimation

In order to run the binning tools included in the pipeline, MAG must first align reads back to the assemblies, and estimate the coverage of each contig.
Expand Down
3 changes: 2 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ params {

// long read preprocessing options
longread_adaptertrimming_tool = "porechop_abi"
longread_filtering_tool = "filtlong"
longread_filtering_tool = "chopper"
longread_filtering_by_shortreads = false
// phix_reference = "ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/Enterobacteria_phage_phiX174_sensu_lato/all_assembly_versions/GCA_002596845.1_ASM259684v1/GCA_002596845.1_ASM259684v1_genomic.fna.gz"
phix_reference = null
save_phixremoved_reads = false
Expand Down
13 changes: 9 additions & 4 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -445,12 +445,12 @@
"longreads_keep_percent": {
"type": "integer",
"default": 90,
"description": "Keep this percent of bases."
"description": "Keep this percent of bases. Only used by filtlong."
},
"longreads_length_weight": {
"type": "integer",
"default": 10,
"description": "The higher the more important is read length when choosing the best reads.",
"description": "The higher the more important is read length when choosing the best reads. Only used by filtlong.",
"help_text": "The default value focuses on length instead of quality to improve assembly size.\nIn order to assign equal weights to read lengths and read qualities set this parameter to 1.\nThis might be useful, for example, to benefit indirectly from the removal of short host reads (causing lower qualities for reads not overlapping filtered short reads)."
},
"keep_lambda": {
Expand Down Expand Up @@ -485,8 +485,13 @@
"longread_filtering_tool": {
"type": "string",
"description": "Specify which long read filtering tool to use.",
"enum": ["filtlong", "nanoq", "chopper"],
"default": "filtlong"
"enum": ["chopper", "filtlong", "nanoq"],
"default": "chopper"
},
"longread_filtering_by_shortreads": {
"type": "boolean",
"description": "Filter long reads against short reads when using filtlong.",
"help_text": "Only applies when `--longread_filtering_tool filtlong` and short reads are supplied. When enabled, filtlong derives long read quality from k-mer matches to the short reads instead of the Phred scores. This can discard long reads not overlapping the short reads, leading to data loss if short read coverage is poor or the metagenomes differ. Disabled by default."
}
}
},
Expand Down
24 changes: 15 additions & 9 deletions subworkflows/local/preprocessing_longread/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -75,15 +75,21 @@ workflow LONGREAD_PREPROCESSING {
}
if (!params.skip_longread_filtering && !val_skip_qc) {
if (params.longread_filtering_tool == 'filtlong') {
// join long and short reads by sample name
ch_short_reads_tmp = ch_short_reads.map { meta, sr -> [meta.id, sr] }

ch_short_and_long_reads = ch_long_reads
.map { meta, lr -> [meta.id, meta, lr] }
.join(ch_short_reads_tmp, by: 0, remainder: true)
.filter { row -> row[1] != null } // filter out samples with no long reads
.map { _id, meta_lr, lr, sr -> [meta_lr, sr ? sr : [], lr] }
// should not occur for single-end, since SPAdes (hybrid) does not support single-end
if (params.longread_filtering_by_shortreads) {
// join long and short reads by sample name
ch_short_reads_tmp = ch_short_reads.map { meta, sr -> [meta.id, sr] }

ch_short_and_long_reads = ch_long_reads
.map { meta, lr -> [meta.id, meta, lr] }
.join(ch_short_reads_tmp, by: 0, remainder: true)
.filter { row -> row[1] != null } // filter out samples with no long reads
.map { _id, meta_lr, lr, sr -> [meta_lr, sr ? sr : [], lr] }
// should not occur for single-end, since SPAdes (hybrid) does not support single-end
}
else {
// filter long reads on their own Phred scores, ignoring short reads
ch_short_and_long_reads = ch_long_reads.map { meta, lr -> [meta, [], lr] }
}

FILTLONG(
ch_short_and_long_reads
Expand Down
41 changes: 22 additions & 19 deletions tests/test_hybrid.nf.test.snap
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"-profile hybrid": {
"content": [
59,
60,
{
"ADJUST_MAXBIN2_EXT": {
"coreutils": 9.5
Expand Down Expand Up @@ -36,9 +36,6 @@
"BUSCO_UNTAR": {
"untar": 1.34
},
"CHOPPER": {
"chopper": "0.8.0"
},
"CONCAT_BUSCO_TSV": {
"qsv": "5.1.0"
},
Expand All @@ -57,6 +54,9 @@
"FASTQC_TRIMMED": {
"fastqc": "0.12.1"
},
"FILTLONG": {
"filtlong": "0.2.1"
},
"GUNZIP_BINS": {
"gunzip": 1.13
},
Expand Down Expand Up @@ -89,6 +89,9 @@
"MINIMAP2_HOST_INDEX": {
"minimap2": "2.29-r1283"
},
"NANOLYSE": {
"nanolyse": "1.2.0"
},
"NANOPLOT_FILTERED": {
"nanoplot": "1.46.1"
},
Expand All @@ -98,8 +101,8 @@
"POOL_LONG_READS": {
"cat": 9.5
},
"PORECHOP_PORECHOP": {
"porechop": "0.2.4"
"PORECHOP_ABI": {
"porechop_abi": "0.5.0"
},
"PRODIGAL": {
"pigz": 2.6,
Expand Down Expand Up @@ -141,10 +144,10 @@
}
}
],
"timestamp": "2026-04-01T14:55:50.11826155",
"timestamp": "2026-06-25T13:22:30.581877242",
"meta": {
"nf-test": "0.9.5",
"nextflow": "25.10.4"
"nextflow": "26.04.4"
}
},
"multiqc": {
Expand Down Expand Up @@ -337,7 +340,7 @@
"multiqc/multiqc_report.html"
],
[
"bowtie2_pe_plot.yaml:md5,331d49d9aa0a0ce1a3a8b407b8a49801",
"bowtie2_pe_plot.yaml:md5,522bada95de93b46cbc9a03177bd12d4",
"busco_plot_bacteria_odb10.yaml:md5,da10d6a882b6f7a5e6eca94f415aabff",
"fastp-insert-size-plot.yaml:md5,256fda12474da96da3bae1662c3dda5b",
"fastp-seq-content-gc-plot_Read_1_After_filtering.yaml:md5,e98b74aa3104dba1a964f042b20f88b2",
Expand Down Expand Up @@ -372,26 +375,26 @@
"fastqc_sequence_length_distribution_plot.yaml:md5,054f8574b204516f6e6739691a117612",
"multiqc_bowtie2.yaml:md5,d6a4f1f2e6952801eda62c414c1685cb",
"multiqc_bowtie2_bowtie2-1.yaml:md5,d6a4f1f2e6952801eda62c414c1685cb",
"multiqc_bowtie2_bowtie2-2.yaml:md5,b5a185109453a80131d7da2cf4d3f2ec",
"multiqc_bowtie2_bowtie2-2.yaml:md5,c4ba5e1efc5c4a535abf8a07ac74cfe0",
"multiqc_busco.yaml:md5,df428a34eda0c1a05aac5cd84a040e08",
"multiqc_citations.yaml:md5,49f79471c1620b06da08b1acb97a1e56",
"multiqc_fastp.yaml:md5,29b2500f793b0c444ee7c6535d406b09",
"multiqc_fastqc.yaml:md5,0d11a6d30d0d1a430e58d44b57ac6e87",
"multiqc_fastqc_fastqc-1.yaml:md5,0b8c8b4edb4f3e70409f30bcdda54397",
"multiqc_general_stats.yaml:md5,edb4f404e32d372bd387a90b72575087",
"multiqc_prokka.yaml:md5,b994dcc420bd2f85d4e0f91549e08b54",
"multiqc_quast.yaml:md5,e5940a0ce6af05da51fe94bc6d7d3901",
"multiqc_quast_quast-1.yaml:md5,b20f16ccd1d9395ecac0ceee2cbe78de",
"multiqc_general_stats.yaml:md5,55fb08efe41b9034631876de1a0a1408",
"multiqc_prokka.yaml:md5,acf8d5a16ca002cb4947275156d4d664",
"multiqc_quast.yaml:md5,38504dc5c2b2a4dca642eaebd509c3c3",
"multiqc_quast_quast-1.yaml:md5,ca453bb9cde3ad3d7055ab4682eb5f05",
"porechop.yaml:md5,828d96e2822bb4b4108ab706c00c4450",
"prokka_plot.yaml:md5,c095a60a9936134ec73e6667a5635bbe",
"quast_num_contigs.yaml:md5,ddc54b8fb069f419547237c52e3058f2",
"quast_table.yaml:md5,e5720e81c420f07480481ac3c75bb822"
"prokka_plot.yaml:md5,00e1273aa90bce2c443da13fed6fb954",
"quast_num_contigs.yaml:md5,c91a25b6638a2805f1fb36bbf008e6df",
"quast_table.yaml:md5,5893c2a49a9093b3d7555a56560a0842"
]
],
"timestamp": "2026-05-04T10:28:55.649537639",
"timestamp": "2026-06-25T13:22:30.618963624",
"meta": {
"nf-test": "0.9.5",
"nextflow": "26.04.0"
"nextflow": "26.04.4"
}
}
}
Loading
Loading