Skip to content

Path object updates and other fixes#412

Merged
DLBPointon merged 25 commits intosanger-tol:schema_updatefrom
mahesh-panchal:nbis_patches
Jun 3, 2025
Merged

Path object updates and other fixes#412
DLBPointon merged 25 commits intosanger-tol:schema_updatefrom
mahesh-panchal:nbis_patches

Conversation

@mahesh-panchal
Copy link
Copy Markdown
Contributor

@mahesh-panchal mahesh-panchal commented May 28, 2025

Changes

  • Convert path strings to file objects as soon as possible.
  • Modify YAML_INPUT process to emit data directly instead of split and recombine with channels.
  • Fix read_ch logic in YAML_INPUT which uses channels in if.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@mahesh-panchal
Copy link
Copy Markdown
Contributor Author

@DLBPointon Would you be able to test this PR out please?

@DLBPointon DLBPointon changed the base branch from dev to dp24_specified_reads May 28, 2025 08:31
@DLBPointon
Copy link
Copy Markdown
Contributor

DLBPointon commented May 28, 2025

Morning @mahesh-panchal,

Thanks for the PR, I've changed the base to dp24_specified_reads which is the current release candidate for 1.4.0. This RC is the 3.2 template (at this rate it'll be 3.3) and has some substantial changes through the pipeline.

This removes all the version(null) stuff.

I'll have to look at your re-write of YAML_INPUT, i like it but it's just not a way i'm familiar with.

Only note I have is that because of the resolve on busco output for the full table, the code is duplicated in ancestral and as the full busco dir isn't used at all it can be deleted and the resolve of busco full table could simply be passed into ancestral. Hopefully that makes sense. It's where we've tacked on the ancestral workflow without correctly integrating it in the past.

Testing I can do although I won't be able to test on our cluster, as the Sanger HPC is currently dead until next week at least.

NOTE: The main difference is template, and longread and hic data are now explicitly input to the pipeline, no more supply a folder and let the pipeline do the work. it has meant that the yaml_input has big chunk of duplicated code. But this will change a fair bit of your work in some of the subworkflows.

@DLBPointon DLBPointon linked an issue May 28, 2025 that may be closed by this pull request
@sanger-tolsoft
Copy link
Copy Markdown

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.2.0.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@mahesh-panchal
Copy link
Copy Markdown
Contributor Author

mahesh-panchal commented May 28, 2025

Thanks for the PR, I've changed the base to dp24_specified_reads which is the current release candidate for 1.4.0. This RC is the 3.2 template (at this rate it'll be 3.3) and has some substantial changes through the pipeline.

Thanks. I've rebased against this branch.

I'll have to look at your re-write of YAML_INPUT, i like it but it's just not a way i'm familiar with.

Sorry about that, but I felt it would be better for maintainability. It's hard to see what's coming out of the channels generally, and there are tests baked into various subsections. For example on one multiMap it tested for workflow == "FULL", but then on a multimap that emitted parts of that, it had workflow == "FULL" && workflow == "JBROWSE" ( or something like that ), but the second part would never be true because the input only satisfies the first condition.

It's basically just made one multiMap, and each channel groups and emits the components that are combined with different channels. There are some parts at the top to define some reused variables. Conditions are formatted like ch_name: ( test ? result : [] ) and then a filter is used on the output channel to remove the [].

Only note I have is that because of the resolve on busco output for the full table, the code is duplicated in ancestral and as the full busco dir isn't used at all it can be deleted and the resolve of busco full table could simply be passed into ancestral. Hopefully that makes sense. It's where we've tacked on the ancestral workflow without correctly integrating it in the past.

I'm not sure I understand, but let me take a closer look at the code.

Testing I can do although I won't be able to test on our cluster, as the Sanger HPC is currently dead until next week at least.

NOTE: The main difference is template, and longread and hic data are now explicitly input to the pipeline, no more supply a folder and let the pipeline do the work. it has meant that the yaml_input has big chunk of duplicated code. But this will change a fair bit of your work in some of the subworkflows.

I'll investigate. Thank you for letting me know. I'll get to work on fixing the conflicts now.

@DLBPointon
Copy link
Copy Markdown
Contributor

Sorry about that, but I felt it would be better for maintainability. It's hard to see what's coming out of the channels generally, and there are tests baked into various subsections. For example on one multiMap it tested for workflow == "FULL", but then on a multimap that emitted parts of that, it had workflow == "FULL" && workflow == "JBROWSE" ( or something like that ), but the second part would never be true because the input only satisfies the first condition.

It's basically just made one multiMap, and each channel groups and emits the components that are combined with different channels. There are some parts at the top to define some reused variables. Conditions are formatted like ch_name: ( test ? result : [] ) and then a filter is used on the output channel to remove the [].

That's not me knocking what you've done, it absolutly has become more difficult to maintain so any simplification is welcome!

I've added some notes. I'm going to pull into local and run the tests that I can.

@mahesh-panchal
Copy link
Copy Markdown
Contributor Author

I need to get it working again. I'll let you know when it's running through again.

@github-actions
Copy link
Copy Markdown

github-actions bot commented May 28, 2025

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit dc949ca

+| ✅ 207 tests passed       |+
#| ❔  33 tests were ignored |#
!| ❗   8 tests had warnings |!
Details

❗ Test warnings:

  • nextflow_config - Config manifest.version should end in dev: 1.4.0
  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in ro-crate-metadata.json: "description": "# sanger-tol/treeval\n\nGitHub Actions CI Status\nGitHub Actions Linting StatusCite with Zenodo\nnf-test\n\nNextflow\nrun with conda\nrun with docker\nrun with singularity\nLaunch on Seqera Platform\n\n## Introduction\n\nsanger-tol/treeval is a bioinformatics pipeline that ...\n\n TODO nf-core:\n Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the\n major pipeline sections and the types of output it produces. You're giving an overview to someone new\n to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction\n\n\n Include a figure that guides the user through the major workflow steps. Many nf-core\n workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. \n Fill in short bullet-pointed list of the default steps in the pipeline \n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.\n\n Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.\n Explain what rows and columns represent. For instance (please edit as appropriate):\n\nFirst, prepare a samplesheet with your input data that looks as follows:\n\nsamplesheet.csv:\n\ncsv\nsample,fastq_1,fastq_2\nCONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz\n\n\nEach row represents a fastq file (single-end) or a pair of fastq files (paired end).\n\n\n\nNow, you can run the pipeline using:\n\n update the following command to include all required parameters for a minimal example \n\nbash\nnextflow run sanger-tol/treeval \\\n -profile <docker/singularity/.../institute> \\\n --input samplesheet.csv \\\n --outdir <OUTDIR>\n\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.\n\n## Credits\n\nsanger-tol/treeval was originally written by Damon-Lee Pointon (@DLBPointon), Yumi Sims (@yumisims) and William Eagles (@weaglesBio).\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n If applicable, make list of people who have also contributed \n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the contributing guidelines.\n\n## Citations\n\n Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. \n If you use sanger-tol/treeval for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX \n\n Add bibliography of tools and data used in your pipeline \n\nAn extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.\n\nThis pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.\n\n> The nf-core framework for community-curated bioinformatics pipelines.\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.\n",
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in methods_description_template.yml: ## Update the HTML below to your prefered methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes

❔ Tests ignored:

  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: assets/multiqc_config.yml
  • files_exist - File is ignored: assets/nf-core-treeval_logo_light.png
  • files_exist - File is ignored: conf/igenomes.config
  • files_exist - File is ignored: conf/igenomes_ignored.config
  • files_exist - File is ignored: conf/test_full.config
  • files_exist - File is ignored: docs/images/nf-core-treeval_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-treeval_logo_dark.png
  • files_exist - File is ignored: CODE_OF_CONDUCT.md
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • nextflow_config - Config variable ignored: validation.help.beforeText
  • nextflow_config - Config variable ignored: validation.help.afterText
  • nextflow_config - Config variable ignored: validation.summary.beforeText
  • nextflow_config - Config variable ignored: validation.summary.afterText
  • files_unchanged - File ignored due to lint config: CODE_OF_CONDUCT.md
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • files_unchanged - File ignored due to lint config: assets/nf-core-treeval_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-treeval_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-treeval_logo_dark.png
  • files_unchanged - File ignored due to lint config: docs/README.md
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/treeval/treeval/.github/workflows/awstest.yml
  • multiqc_config - multiqc_config

✅ Tests passed:

Run details

  • nf-core/tools version 3.2.0
  • Run at 2025-06-02 13:04:18

@mahesh-panchal
Copy link
Copy Markdown
Contributor Author

OK. This now runs to the end successfully with the test dataset.

@mahesh-panchal mahesh-panchal marked this pull request as ready for review June 2, 2025 13:09
@DLBPointon
Copy link
Copy Markdown
Contributor

DLBPointon commented Jun 3, 2025

This looks great to me Mahesh, thanks for the work.
I'm getting @weaglesBio to review as well in case there is anything i've missed.

Edit: Just updating target branch to schema as specified_reads got merged.

@DLBPointon DLBPointon self-requested a review June 3, 2025 14:01
@DLBPointon DLBPointon changed the base branch from dp24_specified_reads to schema_update June 3, 2025 14:02
@DLBPointon DLBPointon merged commit 72d1ced into sanger-tol:schema_update Jun 3, 2025
7 checks passed
@mahesh-panchal mahesh-panchal deleted the nbis_patches branch June 3, 2025 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Path parity with CurationPretext

4 participants