Skip to content

Output filename inconsistency: empty blocking produces __, mixed ./_ separators before method name #719

Description

@pinin4fjords

Output filename inconsistency: empty blocking → __, mixed . / _ separators

Background

After a clean run of the default test profile, the differential-results directory contains:

$ ls /tmp/out-test/tables/differential/rnaseq_deseq2_gsea/
treatment_mCherry_hND6__SRP254919.deseq2.results.tsv
treatment_mCherry_hND6__SRP254919.deseq2.results_filtered.tsv
treatment_mCherry_hND6__SRP254919_deseq2.annotated.tsv
treatment_mCherry_hND6_sample_number_SRP254919.deseq2.results.tsv
treatment_mCherry_hND6_sample_number_SRP254919.deseq2.results_filtered.tsv
treatment_mCherry_hND6_sample_number_SRP254919_deseq2.annotated.tsv

Two cosmetic-but-script-unfriendly issues:

Issue 1: Double underscore from empty blocking variable

The contrast with no blocking variable produces filenames like:

treatment_mCherry_hND6__SRP254919.deseq2.results.tsv
                    ^^

The double underscore is the concatenation of the contrast id (which has a trailing _ placeholder where blocking would go) with the study name. Visible in conf/testdata/rnaseq.config's test contrasts file:

- id: treatment_mCherry_hND6_              # trailing _ for empty blocking
  comparison: ["treatment", "mCherry", "hND6"]
- id: treatment_mCherry_hND6_sample_number # blocking = sample_number
  comparison: ["treatment", "mCherry", "hND6"]

So ${id}_${study_name} produces treatment_mCherry_hND6__SRP254919.

Issue 2: Inconsistent separator before the method name

Within the same directory, the same contrast produces files using two different separators between the contrast id and the method name:

File Separator
treatment_mCherry_hND6__SRP254919.deseq2.results.tsv dot
treatment_mCherry_hND6__SRP254919.deseq2.results_filtered.tsv dot
treatment_mCherry_hND6__SRP254919_deseq2.annotated.tsv underscore

conf/modules.config:261:

withName: CSVTK_JOIN {
    cache = 'deep'
    ext.prefix = {
        def method = meta.params.differential_method
        def prefix = "${meta.id}_${method}.annotated"   // underscore here
        return prefix
    }
}

vs. the differential-results filename convention which uses <id>.<method>.results.tsv (dot before method, from the DESEQ2/LIMMA/DREAM module conventions).

This breaks simple shell glob patterns like *.deseq2.* (misses annotated files) or *_deseq2.* (misses results files), and makes the file naming "feel" inconsistent in any tool that lists or downloads the output directory.

Tasks

1. Decide on the canonical separator between contrast id and method

Probably . (matches the existing <id>.<method>.results.tsv convention from DESEQ2/LIMMA/DREAM modules). Apply the same to CSVTK_JOIN's prefix.

 withName: CSVTK_JOIN {
     cache = 'deep'
     ext.prefix = {
         def method = meta.params.differential_method
-        def prefix = "${meta.id}_${method}.annotated"
+        def prefix = "${meta.id}.${method}.annotated"
         return prefix
     }
 }

After this fix, all differential-table files in the same directory follow <contrast_id>.<method>.<kind>.tsv.

2. Collapse runs of _ introduced by empty blocking

Either:

  • Trim the trailing _ from contrast id construction in workflows/differentialabundance.nf:417 (contrast.id = contrast.values().join('_')) — .replaceAll(/_+$/, ''). This is the cleanest fix.
  • Or post-process the id once it's used in a filename: collapse ___ at the ext.prefix site.

The first option also fixes user-supplied id fields in YAML/CSV contrast files that happen to end with _.

-if (!contrast.id){
-    contrast.id = contrast.values().join('_')
-}
+if (!contrast.id){
+    contrast.id = contrast.values().findAll { it != null && it != '' }.join('_')
+}

The .findAll { ... } filters out empty/null fields (the blocking placeholder), which removes the trailing-underscore pathology at its source.

3. (Optional) snapshot regen

Both fixes will rename output files. Existing nf-test snapshots that reference the renamed files need regenerating. Run the relevant test profiles and review the diff:

nf-test test --profile test,docker --update-snapshot
nf-test test --profile test_affy_limma_gsea,docker --update-snapshot
# etc.

Only the file paths in stable_name should change; content snapshots should remain identical.

Verification

After fixes, list any differential output directory and confirm:

  • No __ anywhere in filenames (assuming no user-supplied contrast id has a deliberate trailing _).
  • Same separator (.) between <contrast_id> and <method> for both *.results.tsv and *.annotated.tsv.
ls tables/differential/<paramset_name>/ | grep -E '__|_(deseq2|limma|dream)\.'
# expect: no matches

Acceptance criteria

  • No double-underscore in differential output filenames for contrasts without a blocking variable.
  • .annotated.tsv files use the same separator before the method name as .results.tsv files.
  • Affected nf-test snapshots regenerated; content snapshots unchanged.
  • CHANGELOG entry under "Changed".

Notes for the implementer

  • Cosmetic but worth doing because users grep / glob over these output directories. Once the separator is consistent, downstream scripting becomes predictable.
  • The contrast-id construction change touches a hot path; verify the test snapshots' stable_name fields update cleanly and don't reorder.
  • Don't bundle other workflow changes here — this is one focused renaming PR.

Drafted with AI assistance during runtime testing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    AI-generatedIssue or PR drafted with AI assistanceenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions