Skip to content

Skip nf-schema path validation for annotation_cache and igenomes_base#2184

Open
pinin4fjords wants to merge 3 commits into
devfrom
fix/annotation-cache-validation
Open

Skip nf-schema path validation for annotation_cache and igenomes_base#2184
pinin4fjords wants to merge 3 commits into
devfrom
fix/annotation-cache-validation

Conversation

@pinin4fjords
Copy link
Copy Markdown
Member

@pinin4fjords pinin4fjords commented May 13, 2026

Summary

snpeff_cache, vep_cache and igenomes_base are declared in the schema with format: directory-path and default to S3 buckets. nf-schema 2.5+ resolves these defaults against the storage backend and fails the launch if the path is unreachable - no credentials, no network, or an SSO/IAM session that can't see the bucket. Any user without read access to s3://annotation-cache/ (e.g. on-prem clusters, restricted cloud accounts) currently can't launch the pipeline at all, even when they have no intention of running snpEff or VEP.

Details

The full debate of where this fix belongs is in nextflow-io/nf-schema#204. The plugin maintainer's settled position is that:

  1. The validator should fail when a configured path is unreachable - that's the point.
  2. The right fix for default-only S3 paths that nobody explicitly opted into is to:
    • drop format: "directory-path" so the existence probe never runs, and
    • add the param to validation.defaultIgnoreParams so nf-schema skips it entirely.

That guidance has been applied in nf-core/rnaseq for igenomes_base (PRs #1696 and #1739). This PR applies the same treatment to sarek's three S3-defaulted directory params.

PR #2083 already removed exists: true from snpeff_cache/vep_cache, but format: directory-path alone is enough to trigger the existence check, so launches still fail for users without access to the default bucket.

Fixes #2079

Changes

  • nextflow.config: add igenomes_base, snpeff_cache, vep_cache to validation.defaultIgnoreParams.
  • nextflow_schema.json: drop "format": "directory-path" from the same three params. Derived/resolved paths are still validated individually downstream.

Test plan

  • Launch from an environment without read access to s3://annotation-cache/ - validation should now pass.
  • Launch with valid --snpeff_cache/--vep_cache overrides - annotation should still work.
  • Launch with --tools snpeff,vep and confirm download-cache flows still error appropriately if the path is bad at the point of use (not at schema validation).

…_base

nf-schema 2.5+ resolves params declared with `format: directory-path`
to verify the path exists on the storage backend. For `snpeff_cache`,
`vep_cache` and `igenomes_base` the defaults point at S3 buckets that
are not always readable from the launch environment (e.g. SLURM
clusters without credentials for s3://annotation-cache/, SSO-scoped
AWS sessions). Validation fails with 403 / "Key cannot be empty"
before the pipeline can decide whether those caches are even needed.

PR #2083 dropped `exists: true` from the two cache params but the
`format: directory-path` declaration still triggers the existence
check. Follow the nf-schema maintainer's recommendation (see
nextflow-io/nf-schema#204): remove the format declaration and add
the params to `defaultIgnoreParams` so the validator skips them.

Refs #2079

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nf-core-bot
Copy link
Copy Markdown
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 0dc8a32

+| ✅ 224 tests passed       |+
#| ❔  13 tests were ignored |#
!| ❗   7 tests had warnings |!
Details

❗ Test warnings:

  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • schema_lint - Input mimetype is missing or empty
  • schema_description - No description provided in schema for parameter: markduplicates_pixel_distance
  • schema_description - No description provided in schema for parameter: gatk_pcr_indel_model

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.1
  • Run at 2026-05-26 09:52:23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants