Skip to content

fix(lint): skip correct_meta_outputs for dynamic path() values#4226

Closed
pinin4fjords wants to merge 3 commits into
devfrom
pinin4fjords/lint-dynamic-output-paths
Closed

fix(lint): skip correct_meta_outputs for dynamic path() values#4226
pinin4fjords wants to merge 3 commits into
devfrom
pinin4fjords/lint-dynamic-output-paths

Conversation

@pinin4fjords
Copy link
Copy Markdown
Member

@pinin4fjords pinin4fjords commented Apr 27, 2026

RFC: should correct_meta_outputs allow dynamic path() arguments?

Today correct_meta_outputs does literal string comparison between path(...) arguments in main.nf and the keys in meta.yml. This means the only legal output declaration is a single static glob (or partial-GString) that meta.yml can mirror exactly.

That works for almost every module, but it forces output globs to be expressible as one static pattern. For modules where the right glob is conditional on input meta — e.g.:

output:
tuple val(meta), path("${reads_glob}"), emit: reads

script:
def reads_glob = meta.single_end ? "${prefix}_trimmed.fq.gz"
                                 : "${prefix}_{1,2}_val_{1,2}.fq.gz"

— there's no static glob that works without either matching files the channel shouldn't carry, or silently dropping legitimate ones (bash globs have no negative lookbehind). The triggering example is nf-core/modules#11317 (trimgalore, distinguishing PE intermediate *_<N>_trimmed.fq.gz from SE final *_trimmed.fq.gz).

Proposal

When the entire path() argument is just a variable reference — path(reads_glob) or path("${reads_glob}") — skip that channel from the meta.yml comparison rather than failing it. Recognised via:

_DYNAMIC_PATH_RE = re.compile(r"^(?:\$\{[a-zA-Z_]\w*\}|[a-zA-Z_]\w*)$")

Anchored, so partial GStrings (${prefix}.bam, ${prefix}_fastqc.html, ${prefix}*.fq.gz) and static globs are unaffected and keep validating literally. val(...) and eval(...) are unaffected regardless of their argument shape.

What's lost

For channels declared this way, lint can no longer verify meta.yml documents the right pattern — meta.yml could be wrong and we wouldn't catch it.

Why not "resolve and check the set"

I considered parsing the script: block to expand reads_glob to its possible values and comparing each against meta.yml. Two problems:

  1. nf-core/tools has no Groovy AST parser; all existing main.nf parsing is ad-hoc regex.
  2. meta.yml's schema is one pattern slot per tuple position, with no way to declare "this slot is one of {A, B}". Even with full resolution, the comparison has nowhere to land.

So a proper fix would also need a meta.yml schema RFC (alternatives per channel position, or per-mode output blocks). That's bigger than this PR.

Question for review

  • Is "skip dynamic-only path() from correct_meta_outputs" an acceptable position pending a schema-level fix?
  • Or would the project rather refuse this pattern entirely and force module authors to find a static-glob workaround (subdir + mv, broader patterns + filtering at the consumer, etc.)?

If the answer is the second, this PR should be closed and nf-core/modules#11317 replaced with nf-core/modules#11315 (subdir+mv).

Test plan

  • Unit test verifies only path(...) entries with bare-identifier or pure-${var} arguments are flagged; val(...)/eval(...) are unaffected.
  • Existing tests/modules/lint/test_meta_yml.py passes (10/10; the 1 skipped is a pre-existing dev failure).
  • Verified against nf-core/modules#11317: trimgalore passes correct_meta_outputs with this patch.
  • mypy clean under repo pre-commit.

🤖 Generated with Claude Code

`correct_meta_outputs` does static string comparison between main.nf
output paths and meta.yml entries. Modules that compute output globs
in the script: block via a `def reads_glob = ...` then reference
`path(reads_glob)` or `path("\${reads_glob}")` in the output: block
were flagged as mismatches because the lint extracts the literal
variable name (or GString) rather than resolving it.

Detect output channels whose `path(...)` argument is a bare Groovy
identifier or pure `\${var}` GString and drop those channels from both
sides of the comparison. Static globs and partial GStrings (`*.bam`,
`\${prefix}_fastqc.html`) are unaffected.

Use case: nf-core/modules/trimgalore where the `reads` glob has to
be conditional on `meta.single_end` to distinguish PE intermediate
files from SE final outputs (negative lookbehind required, not
expressible as a static glob).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 70.83333% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.34%. Comparing base (82141d5) to head (624f8d9).
⚠️ Report is 5 commits behind head on dev.

Files with missing lines Patch % Lines
nf_core/modules/lint/meta_yml.py 70.83% 7 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pinin4fjords and others added 2 commits April 27, 2026 15:43
- Rename DYNAMIC_PATH_RE to _DYNAMIC_PATH_RE (private)
- Split _channels_with_dynamic_paths into a thin set comprehension
  plus _has_dynamic_path so the early-return correctly skips remaining
  elements once a channel is flagged
- Use next(iter(entry)) + entry[key] (avoids items() view allocation)
- Hoist test-only import to module top
- Drop redundant assertion message and what-only docstring

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`NFCoreComponent.outputs` is declared as `list[str]` (the subworkflow
shape) so mypy rejects passing it where a `dict` is expected, even
though for modules it's always a dict. Widen the parameter to
`dict | list` and short-circuit on list (subworkflows have no
`_keyword` info).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@prototaxites
Copy link
Copy Markdown
Contributor

Or would the project rather refuse this pattern entirely and force module authors to find a static-glob workaround (subdir + mv, broader patterns + filtering at the consumer, etc.)?

An alternative would be to emit both globs with separate outputs and then the module user can mix the outputs together into a new channel as and when needed?

Advantages: retains parsability, doesn't require moving or renaming files
Disadvantages: potentially separates the same output into two separate channels

@mahesh-panchal
Copy link
Copy Markdown
Member

An alternative would be to emit both globs with separate outputs and then the module user can mix the outputs together into a new channel as and when needed?

This is what I would have done. There would also be a defensive file test in the code to make sure one or the other is present if it's either-or

ls ${prefix}_trimmed.fq.gz ${prefix}_{1,2}_val_{1,2}.fq.gz >/dev/null 2>&1 || { echo "ERROR: No matching FASTQ files found." >&2; exit 1; }

@pinin4fjords
Copy link
Copy Markdown
Member Author

OK, I'm not going to do that for this specific case, it would require a bunch of rewiring in subworkflows and Felix's Rust Trimgalore rewrite (next week) negates the need.

But fair enough on the general pattern.

@pinin4fjords
Copy link
Copy Markdown
Member Author

Closing - the trimgalore-specific motivation has dropped (the producing module change is no longer being pursued). If a future module case reintroduces the need for dynamic path() arguments to pass correct_meta_outputs, this can be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants