Skip to content

Conversation

@Gilbaja
Copy link

@Gilbaja Gilbaja commented Mar 26, 2025

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/bacass branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Issue #66

Adds homopolish as a tool for polishing after polished by medaka.
Before executing homopolish it needs a bacteria sketch to be downloaded (3.3GB) and unzipped.
To do so, an additional module is created, and the workflow includes a condition for checking if it has been already downloaded in the output folder. It also includes a new parameter for forcing the download.

Test cmd

nextflow run . -profile test_long,conda --polish_method medaka_homopolish --outdir results

Pending tasks

  • Adding tests
  • Check with singularity

@Gilbaja Gilbaja changed the title Dev Add homopolish for nanopore-only assembly Mar 26, 2025
@Gilbaja Gilbaja marked this pull request as ready for review March 26, 2025 22:30
@Daniel-VM Daniel-VM self-requested a review March 27, 2025 12:55
Copy link
Contributor

@Daniel-VM Daniel-VM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, Check the review comments, please.

task.ext.when == null || task.ext.when

script:
def prefix = task.ext.prefix ?: "${meta.id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can follow here nf-core structure to get both prefix and potential args:

def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"

Additionally, you'll need to add the $args variable to the Homoplasy bash run.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I copied that from another module with something else, but I don´t think I'm using it.
I don´t know the nf-core structure for that. Do you think we need it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If args isnt used there is no need for the def args line imho.

@@ -0,0 +1,35 @@
process HOMOPOLISH {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can follow nf-core convention here: modules/local//
Could you place the main.nf file (and its related files) within homopolish/homopolish/ folder? And also the module would need to be renamed to HOMOPLISH_HOMOPILISH.

homopolish polish \
-a $medaka_genome \
-s $bacteria_sketch \
-m $params.homopolish_model \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay, but to make the script easier to read, we can use params.homopolish_model as an input channel for this process.

// Assembly polishing
polish_method = 'medaka' // Allowed: ['medaka', 'nanopolish']
polish_method = 'medaka' // Allowed: ['medaka', 'nanopolish', 'medaka_homopolish']
homopolish_bacteria_sketch_url = 'https://bioinfo.cs.ccu.edu.tw/bioinfo/downloads/Homopolish_Sketch/bacteria.msh.gz'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to allow users to define a local sketch database via CLI . Lets say: --homopolish_sketchdb_path ?.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, we can replace homopolish_bacteria_sketch_url with homopolish_sketchdb_path. The Nextflow engine can distinguish between a local path and a URL—if a URL is provided, it will fetch it automatically.

GUNZIP_HOMOPOLISH( ch_sketch )
} else {
// MODULE: Download bacteria sketch
HOMOPOLISH_SKETCH_PREPARATION(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this process can be removed, since the Nextflow engine can download and stage a file automatically when a URL is provided via params.

Copy link
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree a little work could be done as @Daniel-VM suggested.
Are you up to that @Gilbaja ?

Comment on lines +24 to +27
homopolish polish \
-a $medaka_genome \
-s $bacteria_sketch \
-m $params.homopolish_model \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
homopolish polish \
-a $medaka_genome \
-s $bacteria_sketch \
-m $params.homopolish_model \
homopolish polish \\
-a $medaka_genome \\
-s $bacteria_sketch \\
-m $homopolish_model \\

double slashes to keep formatting in the .command.sh in the work folder.
also, -m $params.homopolish_model \ is bad practice, it should be solved with a val input

input:
tuple val(meta), path(medaka_genome)
tuple val(meta_gunzip), path(bacteria_sketch)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val homopolish_model

-s $bacteria_sketch \
-m $params.homopolish_model \
-o .
cat <<-END_VERSIONS > versions.yml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cat <<-END_VERSIONS > versions.yml
cat <<-END_VERSIONS > versions.yml

I like here an empty line for clarity

GUNZIP_HOMOPOLISH ( HOMOPOLISH_SKETCH_PREPARATION.out.sketch )
}
// MODULE: Homopolish, polishes MEDAKA assembly
HOMOPOLISH ( ch_assembly, GUNZIP_HOMOPOLISH.out.gunzip )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HOMOPOLISH ( ch_assembly, GUNZIP_HOMOPOLISH.out.gunzip )
HOMOPOLISH ( ch_assembly, GUNZIP_HOMOPOLISH.out.gunzip, params.homopolish_model )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants