Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [#229](https://github.com/nf-core/bacass/pull/229) Add homopolish for nanopore-only assembly.
- [#195](https://github.com/nf-core/bacass/pull/195) Update nf-core/bacass to the new nf-core 3.2.0 `TEMPLATE`.

### `Fixed`
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ This pipeline is primarily for bacterial assembly of next-generation sequencing

For users that only have Nanopore data, the pipeline quality trims these using [PoreChop](https://github.com/rrwick/Porechop) and assesses basic sequencing QC utilizing [NanoPlot](https://github.com/wdecoster/NanoPlot) and [PycoQC](https://github.com/a-slide/pycoQC). Contamination of the assembly is checked using [Kraken2](https://ccb.jhu.edu/software/kraken2/) and [Kmerfinder](https://bitbucket.org/genomicepidemiology/kmerfinder/src/master/) to verify sample purity.

The pipeline can then perform long read assembly utilizing [Unicycler](https://github.com/rrwick/Unicycler), [Miniasm](https://github.com/lh3/miniasm) in combination with [Racon](https://github.com/isovic/racon), [Canu](https://github.com/marbl/canu) or [Flye](https://github.com/fenderglass/Flye) by using the [Dragonflye](https://github.com/rpetit3/dragonflye)(\*) pipeline. Long reads assembly can be polished using [Medaka](https://github.com/nanoporetech/medaka) or [NanoPolish](https://github.com/jts/nanopolish) with Fast5 files.
The pipeline can then perform long read assembly utilizing [Unicycler](https://github.com/rrwick/Unicycler), [Miniasm](https://github.com/lh3/miniasm) in combination with [Racon](https://github.com/isovic/racon), [Canu](https://github.com/marbl/canu) or [Flye](https://github.com/fenderglass/Flye) by using the [Dragonflye](https://github.com/rpetit3/dragonflye)(\*) pipeline. Long reads assembly can be polished using [Medaka](https://github.com/nanoporetech/medaka), **Medaka** folowed by [Homopolish](https://github.com/ythuang0522/homopolish) or [NanoPolish](https://github.com/jts/nanopolish) with Fast5 files.

> [!NOTE]
> Dragonflye is a comprehensive pipeline designed for genome assembly of Oxford Nanopore Reads. It facilitates the utilization of Flye (default), Miniasm, and Raven assemblers, along with Racon (default) and Medaka polishers. For more information, visit the [Dragonflye GitHub](https://github.com/rpetit3/dragonflye) repository.
Expand Down
18 changes: 18 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,24 @@ process {
]
}

withName: 'HOMOPOLISH_SKETCH_PREPARATION' {
ext.args = ''
publishDir = [
path: { "${params.outdir}/Homopolish_sketch" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'HOMOPOLISH' {
ext.args = ''
publishDir = [
path: { "${params.outdir}/Homopolish" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'KRAKEN2' {
ext.args = ''
publishDir = [
Expand Down
8 changes: 8 additions & 0 deletions modules/local/homopolish/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
name: homopolish
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- bioconda::homopolish=0.4.1
- conda-forge::more-itertools=9.1.0
35 changes: 35 additions & 0 deletions modules/local/homopolish/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
process HOMOPOLISH {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can follow nf-core convention here: modules/local//
Could you place the main.nf file (and its related files) within homopolish/homopolish/ folder? And also the module would need to be renamed to HOMOPLISH_HOMOPILISH.

tag "$meta.id"
label 'process_high'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/homopolish:0.4.1--pyhdfd78af_1' :
'biocontainers/homopolish:0.4.1--pyhdfd78af_0' }"

input:
tuple val(meta), path(medaka_genome)
tuple val(meta_gunzip), path(bacteria_sketch)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val homopolish_model

output:
tuple val(meta), path('*_genome_homopolished.fasta') , emit: assembly
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def prefix = task.ext.prefix ?: "${meta.id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can follow here nf-core structure to get both prefix and potential args:

def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"

Additionally, you'll need to add the $args variable to the Homoplasy bash run.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I copied that from another module with something else, but I don´t think I'm using it.
I don´t know the nf-core structure for that. Do you think we need it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If args isnt used there is no need for the def args line imho.

"""
homopolish polish \
-a $medaka_genome \
-s $bacteria_sketch \
-m $params.homopolish_model \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay, but to make the script easier to read, we can use params.homopolish_model as an input channel for this process.

Comment on lines +24 to +27
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
homopolish polish \
-a $medaka_genome \
-s $bacteria_sketch \
-m $params.homopolish_model \
homopolish polish \\
-a $medaka_genome \\
-s $bacteria_sketch \\
-m $homopolish_model \\

double slashes to keep formatting in the .command.sh in the work folder.
also, -m $params.homopolish_model \ is bad practice, it should be solved with a val input

-o .
cat <<-END_VERSIONS > versions.yml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cat <<-END_VERSIONS > versions.yml
cat <<-END_VERSIONS > versions.yml

I like here an empty line for clarity

"${task.process}":
homopolish: \$( homopolish --version 2>&1 | sed 's/Homopolish VERSION: *//g' )
END_VERSIONS
"""
}
7 changes: 7 additions & 0 deletions modules/local/homopolish/sketch_preparation/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
name: homopolish_sketch_preparation
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- conda-forge::sed=4.7
25 changes: 25 additions & 0 deletions modules/local/homopolish/sketch_preparation/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
process HOMOPOLISH_SKETCH_PREPARATION {
label 'process_low'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/curl:7.80.0' :
'biocontainers/curl:7.80.0' }"

input:
val(meta)
path(url)

output:
tuple val(meta), path("bacteria.msh.gz"), emit: sketch
path "versions.yml" , emit: versions

script:
"""
curl $params.homopolish_bacteria_sketch_url
cat <<-END_VERSIONS > versions.yml
"${task.process}":
Homopolish_Sketch Bacteria: $params.homopolish_bacteria_last
END_VERSIONS
"""
}
6 changes: 5 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,11 @@ params {
dragonflye_args = ''

// Assembly polishing
polish_method = 'medaka' // Allowed: ['medaka', 'nanopolish']
polish_method = 'medaka' // Allowed: ['medaka', 'nanopolish', 'medaka_homopolish']
homopolish_bacteria_sketch_url = 'https://bioinfo.cs.ccu.edu.tw/bioinfo/downloads/Homopolish_Sketch/bacteria.msh.gz'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to allow users to define a local sketch database via CLI . Lets say: --homopolish_sketchdb_path ?.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, we can replace homopolish_bacteria_sketch_url with homopolish_sketchdb_path. The Nextflow engine can distinguish between a local path and a URL—if a URL is provided, it will fetch it automatically.

homopolish_bacteria_last = '2024-08-16' // From: https://bioinfo.cs.ccu.edu.tw/bioinfo/download.html
homopolish_model = 'R9.4.pkl' // Allowed: ['R9.4.pkl', 'R10.3.pkl', 'pb.pkl']
homopolish_reload_sketch = false

// Annotation
annotation_tool = 'prokka' // Allowed: ['prokka', 'bakta','dfast']
Expand Down
31 changes: 30 additions & 1 deletion nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,36 @@
"fa_icon": "fas fa-hotdog",
"description": "Which assembly polishing method to use.",
"help_text": "Can be used to define which polishing method is used by default for long reads.",
"enum": ["medaka", "nanopolish"]
"enum": ["medaka", "nanopolish", "medaka_homopolish"]
},
"homopolish_model": {
"type": "string",
"default": "R9.4.pkl",
"fa_icon": "fas fa-hotdog",
"description": "Which homopolish polishing model to use.",
"help_text": "Used to define which homopolish polishing model is used for long reads after medaka.",
"enum": ["R9.4.pkl", "R10.3.pkl", "pb.pkl"]
},
"homopolish_reload_sketch": {
"type": "boolean",
"default": false,
"fa_icon": "fas fa-hotdog",
"description": "Reload homopolish bacteria sketch.",
"help_text": "Used to define if homopolish bacteria sketch has to be reloaded from its download webpage."
},
"homopolish_bacteria_sketch_url": {
"type": "string",
"default": "https://bioinfo.cs.ccu.edu.tw/bioinfo/downloads/Homopolish_Sketch/bacteria.msh.gz",
"fa_icon": "fas fa-hotdog",
"description": "Homopolish Bacteria Sketch download URL.",
"help_text": "Can be used to define the URL for downloading the bacteria Homopolish Sketch."
},
"homopolish_bacteria_last": {
"type": "string",
"default": "2024-08-16",
"fa_icon": "fas fa-hotdog",
"description": "Homopolish Bacteria Sketch version date.",
"help_text": "Defines the Homopolish Sketch version date for version reporting."
}
}
},
Expand Down
2 changes: 2 additions & 0 deletions subworkflows/local/utils_nfcore_bacass_pipeline/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,7 @@ def toolCitationText() {
"Canu (Sergey Koren et al. 2017)",
"Medaka (Heng Li)",
"Nanopolish (Loman 2015)",
"Homopolish (Huang, Y.-T., Liu, P.-Y., and Shih, P.-W. 2021)",
"Quast (Alexey Gurevich et al. 2013)",
"Prokka (Torsten Seemann 2014)",
"Bakta (Oliver Schwengers 2021)",
Expand Down Expand Up @@ -249,6 +250,7 @@ def toolBibliographyText() {
"<li>Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017 May;27(5):722-736. doi: 10.1101/gr.215087.116.</li>",
"<li>Medaka: Sequence correction provided by ONT Research. https://github.com/nanoporetech/medaka,</li>",
"<li>Loman, N., Quick, J. & Simpson, J. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 12, 733–735 (2015). https://doi.org/10.1038/nmeth.3444</li>",
"<li>Huang, Y.-T., Liu, P.-Y., and Shih, P.-W. Homopolish: a method for the revmoal of systematic errors in nanopore sequencing by homologous polishing, Genome Biology, 2021. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02282-6</li>",
"<li>Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013 Apr 15;29(8):1072-5. doi: 10.1093/bioinformatics/btt086. Epub 2013 Feb 19.</li>",
"<li>Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014 Jul 15;30(14):2068-9. doi: 10.1093/bioinformatics/btu153.</li>",
"<li>Schwengers O, Jelonek L, Dieckmann MA, Beyvers S, Blom J, Goesmann A. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microb Genom. 2021 Nov;7(11):000685. doi: 10.1099/mgen.0.000685.</li>",
Expand Down
44 changes: 37 additions & 7 deletions workflows/bacass.nf
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,14 @@
//
// MODULE: Local to the pipeline
//
include { PYCOQC } from '../modules/local/pycoqc'
include { NANOPOLISH } from '../modules/local/nanopolish'
include { MEDAKA } from '../modules/local/medaka'
include { KRAKEN2_DB_PREPARATION } from '../modules/local/kraken2/db_preparation'
include { DFAST } from '../modules/local/dfast'
include { CUSTOM_MULTIQC } from '../modules/local/custom/multiqc'
include { PYCOQC } from '../modules/local/pycoqc'
include { NANOPOLISH } from '../modules/local/nanopolish'
include { MEDAKA } from '../modules/local/medaka'
include { KRAKEN2_DB_PREPARATION } from '../modules/local/kraken2/db_preparation'
include { DFAST } from '../modules/local/dfast'
include { CUSTOM_MULTIQC } from '../modules/local/custom/multiqc'
include { HOMOPOLISH_SKETCH_PREPARATION } from '../modules/local/homopolish/sketch_preparation'
include { HOMOPOLISH } from '../modules/local/homopolish'

//
// MODULE: Installed directly from nf-core/modules
Expand All @@ -37,6 +39,7 @@ include { KRAKEN2_KRAKEN2 as KRAKEN2_LONG } from '../modules/nf-core/krake
include { QUAST } from '../modules/nf-core/quast'
include { QUAST as QUAST_BYREFSEQID } from '../modules/nf-core/quast'
include { GUNZIP } from '../modules/nf-core/gunzip'
include { GUNZIP as GUNZIP_HOMOPOLISH } from '../modules/nf-core/gunzip'
include { PROKKA } from '../modules/nf-core/prokka'

//
Expand Down Expand Up @@ -304,13 +307,40 @@ workflow BACASS {
.join( ch_assembly )
.map { meta, sr, lr, fasta -> tuple(meta, lr, fasta) }
.set { ch_polish_long } // channel: [ val(meta), path(lr), path(fasta) ]
if (params.polish_method == 'medaka'){
if (params.polish_method in ['medaka', 'medaka_homopolish'] ){
//
// MODULE: Medaka, polishes assembly - should take either miniasm, canu, or unicycler consensus sequence
//
MEDAKA ( ch_polish_long )
ch_assembly = MEDAKA.out.assembly
ch_versions = ch_versions.mix(MEDAKA.out.versions)
// If homopolish after medaka
if (params.polish_method == 'medaka_homopolish') {
// Check if sketch file already exists
sketch_path = "$baseDir/$params.outdir/Homopolish_sketch/bacteria.msh.gz"
sketch_file = new File(sketch_path)
// If sketch exists and not forced to reload, unzip sketch from outdir
if (sketch_file.exists() & !params.homopolish_reload_sketch) {
ch_sketch = tuple(
ch_assembly.collect{it[1]}, // meta from assembly channel
sketch_path
)
GUNZIP_HOMOPOLISH( ch_sketch )
} else {
// MODULE: Download bacteria sketch
HOMOPOLISH_SKETCH_PREPARATION(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this process can be removed, since the Nextflow engine can download and stage a file automatically when a URL is provided via params.

ch_assembly.collect{it[1]}, // meta
params.homopolish_bacteria_sketch_url
)
ch_versions = ch_versions.mix(HOMOPOLISH_SKETCH_PREPARATION.out.versions)
// Unzip bacteria sketch
GUNZIP_HOMOPOLISH ( HOMOPOLISH_SKETCH_PREPARATION.out.sketch )
}
// MODULE: Homopolish, polishes MEDAKA assembly
HOMOPOLISH ( ch_assembly, GUNZIP_HOMOPOLISH.out.gunzip )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HOMOPOLISH ( ch_assembly, GUNZIP_HOMOPOLISH.out.gunzip )
HOMOPOLISH ( ch_assembly, GUNZIP_HOMOPOLISH.out.gunzip, params.homopolish_model )

ch_assembly = HOMOPOLISH.out.assembly
ch_versions = ch_versions.mix(HOMOPOLISH.out.versions)
}
} else if (params.polish_method == 'nanopolish') {
//
// MODULE: Nanopolish, polishes assembly using FAST5 files
Expand Down