Skip to content

General refactor and format #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: dev
Choose a base branch
from
Open

General refactor and format #142

wants to merge 13 commits into from

Conversation

lmReef
Copy link
Collaborator

@lmReef lmReef commented Apr 22, 2025

Overview

  • slight refactor of some config for improved readability
    • the main refactor to be aware of is 451da46
  • ran nextflow formatter over everything

everything still runs

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - add to the software_versions process and a regex to scrape_software_versions.py
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/dualrnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint .).
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@lmReef lmReef self-assigned this Apr 22, 2025
@lmReef lmReef marked this pull request as ready for review April 23, 2025 02:52
Copy link

github-actions bot commented Apr 23, 2025

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 23be7a6

+| ✅ 249 tests passed       |+
!| ❗  32 tests had warnings |!

❗ Test warnings:

  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in nextflow.config: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
  • pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file
  • pipeline_todos - TODO string in ro-crate-metadata.json: "description": "

    \n \n <source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core-dualrnaseq_logo_dark.png">\n <img alt="nf-core/dualrnaseq" src="docs/images/nf-core-dualrnaseq_logo_light.png">\n \n

    \n\nGitHub Actions CI Status\nGitHub Actions Linting StatusAWS CICite with Zenodo\nnf-test\n\nNextflow\nrun with conda\nrun with docker\nrun with singularity\nLaunch on Seqera Platform\n\nGet help on SlackFollow on TwitterFollow on MastodonWatch on YouTube\n\n## Introduction\n\nnf-core/dualrnaseq is a bioinformatics pipeline that ...\n\n TODO nf-core:\n Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the\n major pipeline sections and the types of output it produces. You're giving an overview to someone new\n to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction\n\n\n Include a figure that guides the user through the major workflow steps. Many nf-core\n workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. \n Fill in short bullet-pointed list of the default steps in the pipeline 1. Read QC (FastQC)2. Present QC for raw reads (MultiQC)\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.\n\n Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.\n Explain what rows and columns represent. For instance (please edit as appropriate):\n\nFirst, prepare a samplesheet with your input data that looks as follows:\n\nsamplesheet.csv:\n\ncsv\nsample,fastq_1,fastq_2\nCONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz\n\n\nEach row represents a fastq file (single-end) or a pair of fastq files (paired end).\n\n\n\nNow, you can run the pipeline using:\n\n update the following command to include all required parameters for a minimal example \n\nbash\nnextflow run nf-core/dualrnaseq \\\n -profile <docker/singularity/.../institute> \\\n --input samplesheet.csv \\\n --outdir <OUTDIR>\n\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.\n\nFor more details and further functionality, please refer to the usage documentation and the parameter documentation.\n\n## Pipeline output\n\nTo see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page.\nFor more details about the output files and reports, please refer to the\noutput documentation.\n\n## Credits\n\nnf-core/dualrnaseq was originally written by author_field.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n If applicable, make list of people who have also contributed \n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the contributing guidelines.\n\nFor further information or help, don't hesitate to get in touch on the Slack #dualrnaseq channel (you can join with this invite).\n\n## Citations\n\n Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. \n If you use nf-core/dualrnaseq for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX \n\n Add bibliography of tools and data used in your pipeline \n\nAn extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.\n\nYou can cite the nf-core publication as follows:\n\n> The nf-core framework for community-curated bioinformatics pipelines.\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.\n",
  • pipeline_todos - TODO string in README.md: Write a 1-2 sentence summary of what data the pipeline is for and what it does
  • pipeline_todos - TODO string in README.md: Add full-sized test dataset and amend the paragraph below if applicable
  • pipeline_todos - TODO string in README.md: Fill in short bullet-pointed list of the default steps in the pipeline
  • pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
  • pipeline_todos - TODO string in README.md: Fill in short bullet-pointed list of the default steps in the pipeline 1. Read QC (FastQC)2. Present QC for raw reads (MultiQC)
  • pipeline_todos - TODO string in README.md: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
  • pipeline_todos - TODO string in README.md: update the following command to include all required parameters for a minimal example
  • pipeline_todos - TODO string in README.md: Update the example "typical command" below used to run the pipeline
  • pipeline_todos - TODO string in README.md: If applicable, make list of people who have also contributed
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
  • pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
  • pipeline_todos - TODO string in test.config: Specify the paths to your test data on nf-core/test-datasets
  • pipeline_todos - TODO string in test.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in output.md: Write this documentation describing your workflow's output
  • pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • schema_lint - Schema 'description' should be 'Dual RNA-seq pipeline'
    Found: 'DualRNA-seq pipeline for the analysis of bacterial and host transcriptomes'
  • schema_description - Ungrouped param in schema: fastqc
  • schema_description - Ungrouped param in schema: cutadapt

✅ Tests passed:

Run details

  • nf-core/tools version 3.2.0
  • Run at 2025-04-29 04:29:52

cpus = { 1 * task.attempt }
memory = { 6.GB * task.attempt }
time = { 4.h * task.attempt }
cpus = { 1 * task.attempt }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you've done here, and I'm fine with it. There is a general approach that people seem to like anf that the template generates that the equals sighs are vertical.

@@ -14,12 +14,12 @@ process {
resourceLimits = [
cpus: 2,
memory: '6.GB',
time: '1.h'
time: '1.h',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the last item doesn't require a comma?

@@ -5,25 +5,25 @@
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/dualrnaseq -profile test_hackathon,<docker/singularity> --outdir <OUTDIR>
nextflow run nf-core/dualrnaseq -profile test_2_fasta_pathogen,<docker/singularity> --outdir <OUTDIR>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the naming should be different. This naming doesn't quite make sense as the fasta contains host and pathogen reads.
Perhaps test_small as the files are small and were designed on purpose for fast testing run-times

Comment on lines -25 to -37
tuple val(meta), path('*d.out.bam') , emit: bam
tuple val(meta), path('*Log.final.out') , emit: log_final
tuple val(meta), path('*Log.out') , emit: log_out
tuple val(meta), path('*d.out.bam'), emit: bam
tuple val(meta), path('*Log.final.out'), emit: log_final
tuple val(meta), path('*Log.out'), emit: log_out
tuple val(meta), path('*Log.progress.out'), emit: log_progress
path "versions.yml" , emit: versions
path "versions.yml", emit: versions

tuple val(meta), path('*sortedByCoord.out.bam') , optional:true, emit: bam_sorted
tuple val(meta), path('*sortedByCoord.out.bam'), optional: true, emit: bam_sorted
// tuple val(meta), path('*toTranscriptome.out.bam'), optional:true, emit: bam_transcript
tuple val(meta), path('*Aligned.unsort.out.bam') , optional:true, emit: bam_unsorted
tuple val(meta), path('*fastq.gz') , optional:true, emit: fastq
tuple val(meta), path('*.tab') , optional:true, emit: tab
tuple val(meta), path('*.out.junction') , optional:true, emit: junction
tuple val(meta), path('*.out.sam') , optional:true, emit: sam
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see here why they vertically align the outputs - when there's lots, it does become easier to read.

@@ -115,14 +112,18 @@ params {
// --------------
libtype = 'A'
// gene_feature_gff_to_create_transcriptome_host = ["exon"] // old
gene_feature_gff_to_create_transcriptome_host = "exon" // new
gene_feature_gff_to_create_transcriptome_host = "exon"
// new
gene_attribute_gff_to_create_transcriptome_host = "transcript_id"
// gene_feature_gff_to_create_transcriptome_pathogen = ["gene", "sRNA", "tRNA", "rRNA"] //old
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If yo are wondering why there are a bunch of these comments that I've commented out and left. I've kept them in until all three tools successfully run and the output looks reliable. Otherwise if something isn't quite correct somewhere, these are a quick reference point

quantMode = 'TranscriptomeSAM' // Create a transcriptome Bam (to use with Salmon) - also generates a genome bam (use with STAR genome alignment)
quantTranscriptomeBan = 'Singleend' // Regarding transcriptome alignments - allowing for insertions, deletions and soft-clips (Singleend option). To prohibit this behaviour, specify IndelSoftclipSingleend
limitBAMsortRAM = 0 // Option to limit RAM when sorting BAM file. If 0, will be set to the genome index size, which can be quite large when running on a desktop or laptop
outSAMunmapped = 'Within'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to shift all these comments to above the param?

},
"pathogen_organism": {
"type": "string",
"description": "Change to custom name if desired, ie Salmonella_SL1344",
"fa_icon": "far fa-file"
"fa_icon": "far fa-file",
"default": "pathogen"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - unless I'm missing something obvious

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these 2 were caught by the linter nf-core pipelines schema lint as not having defaults in the schema but they are actually set with default values in the config already

Copy link
Collaborator

@reganhayward reganhayward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to tick the last file

@reganhayward
Copy link
Collaborator

Forgot to tick the last file

I thought I'd have to tick (viewed) on all files before approving - good to know

@lmReef
Copy link
Collaborator Author

lmReef commented Apr 29, 2025

grand, not sure why but all of my replies got marked as reviews

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants