Output filename inconsistency: empty blocking → __, mixed . / _ separators
Background
After a clean run of the default test profile, the differential-results directory contains:
$ ls /tmp/out-test/tables/differential/rnaseq_deseq2_gsea/
treatment_mCherry_hND6__SRP254919.deseq2.results.tsv
treatment_mCherry_hND6__SRP254919.deseq2.results_filtered.tsv
treatment_mCherry_hND6__SRP254919_deseq2.annotated.tsv
treatment_mCherry_hND6_sample_number_SRP254919.deseq2.results.tsv
treatment_mCherry_hND6_sample_number_SRP254919.deseq2.results_filtered.tsv
treatment_mCherry_hND6_sample_number_SRP254919_deseq2.annotated.tsv
Two cosmetic-but-script-unfriendly issues:
Issue 1: Double underscore from empty blocking variable
The contrast with no blocking variable produces filenames like:
treatment_mCherry_hND6__SRP254919.deseq2.results.tsv
^^
The double underscore is the concatenation of the contrast id (which has a trailing _ placeholder where blocking would go) with the study name. Visible in conf/testdata/rnaseq.config's test contrasts file:
- id: treatment_mCherry_hND6_ # trailing _ for empty blocking
comparison: ["treatment", "mCherry", "hND6"]
- id: treatment_mCherry_hND6_sample_number # blocking = sample_number
comparison: ["treatment", "mCherry", "hND6"]
So ${id}_${study_name} produces treatment_mCherry_hND6__SRP254919.
Issue 2: Inconsistent separator before the method name
Within the same directory, the same contrast produces files using two different separators between the contrast id and the method name:
| File |
Separator |
treatment_mCherry_hND6__SRP254919.deseq2.results.tsv |
dot |
treatment_mCherry_hND6__SRP254919.deseq2.results_filtered.tsv |
dot |
treatment_mCherry_hND6__SRP254919_deseq2.annotated.tsv |
underscore |
conf/modules.config:261:
withName: CSVTK_JOIN {
cache = 'deep'
ext.prefix = {
def method = meta.params.differential_method
def prefix = "${meta.id}_${method}.annotated" // underscore here
return prefix
}
}
vs. the differential-results filename convention which uses <id>.<method>.results.tsv (dot before method, from the DESEQ2/LIMMA/DREAM module conventions).
This breaks simple shell glob patterns like *.deseq2.* (misses annotated files) or *_deseq2.* (misses results files), and makes the file naming "feel" inconsistent in any tool that lists or downloads the output directory.
Tasks
1. Decide on the canonical separator between contrast id and method
Probably . (matches the existing <id>.<method>.results.tsv convention from DESEQ2/LIMMA/DREAM modules). Apply the same to CSVTK_JOIN's prefix.
withName: CSVTK_JOIN {
cache = 'deep'
ext.prefix = {
def method = meta.params.differential_method
- def prefix = "${meta.id}_${method}.annotated"
+ def prefix = "${meta.id}.${method}.annotated"
return prefix
}
}
After this fix, all differential-table files in the same directory follow <contrast_id>.<method>.<kind>.tsv.
2. Collapse runs of _ introduced by empty blocking
Either:
- Trim the trailing
_ from contrast id construction in workflows/differentialabundance.nf:417 (contrast.id = contrast.values().join('_')) — .replaceAll(/_+$/, ''). This is the cleanest fix.
- Or post-process the id once it's used in a filename: collapse
__ → _ at the ext.prefix site.
The first option also fixes user-supplied id fields in YAML/CSV contrast files that happen to end with _.
-if (!contrast.id){
- contrast.id = contrast.values().join('_')
-}
+if (!contrast.id){
+ contrast.id = contrast.values().findAll { it != null && it != '' }.join('_')
+}
The .findAll { ... } filters out empty/null fields (the blocking placeholder), which removes the trailing-underscore pathology at its source.
3. (Optional) snapshot regen
Both fixes will rename output files. Existing nf-test snapshots that reference the renamed files need regenerating. Run the relevant test profiles and review the diff:
nf-test test --profile test,docker --update-snapshot
nf-test test --profile test_affy_limma_gsea,docker --update-snapshot
# etc.
Only the file paths in stable_name should change; content snapshots should remain identical.
Verification
After fixes, list any differential output directory and confirm:
- No
__ anywhere in filenames (assuming no user-supplied contrast id has a deliberate trailing _).
- Same separator (
.) between <contrast_id> and <method> for both *.results.tsv and *.annotated.tsv.
ls tables/differential/<paramset_name>/ | grep -E '__|_(deseq2|limma|dream)\.'
# expect: no matches
Acceptance criteria
Notes for the implementer
- Cosmetic but worth doing because users grep / glob over these output directories. Once the separator is consistent, downstream scripting becomes predictable.
- The contrast-id construction change touches a hot path; verify the test snapshots' stable_name fields update cleanly and don't reorder.
- Don't bundle other workflow changes here — this is one focused renaming PR.
Drafted with AI assistance during runtime testing.
Output filename inconsistency: empty blocking →
__, mixed./_separatorsBackground
After a clean run of the default test profile, the differential-results directory contains:
Two cosmetic-but-script-unfriendly issues:
Issue 1: Double underscore from empty blocking variable
The contrast with no blocking variable produces filenames like:
The double underscore is the concatenation of the contrast id (which has a trailing
_placeholder where blocking would go) with the study name. Visible inconf/testdata/rnaseq.config's test contrasts file:So
${id}_${study_name}producestreatment_mCherry_hND6__SRP254919.Issue 2: Inconsistent separator before the method name
Within the same directory, the same contrast produces files using two different separators between the contrast id and the method name:
treatment_mCherry_hND6__SRP254919.deseq2.results.tsvtreatment_mCherry_hND6__SRP254919.deseq2.results_filtered.tsvtreatment_mCherry_hND6__SRP254919_deseq2.annotated.tsvconf/modules.config:261:vs. the differential-results filename convention which uses
<id>.<method>.results.tsv(dot before method, from the DESEQ2/LIMMA/DREAM module conventions).This breaks simple shell glob patterns like
*.deseq2.*(misses annotated files) or*_deseq2.*(misses results files), and makes the file naming "feel" inconsistent in any tool that lists or downloads the output directory.Tasks
1. Decide on the canonical separator between contrast id and method
Probably
.(matches the existing<id>.<method>.results.tsvconvention from DESEQ2/LIMMA/DREAM modules). Apply the same toCSVTK_JOIN's prefix.withName: CSVTK_JOIN { cache = 'deep' ext.prefix = { def method = meta.params.differential_method - def prefix = "${meta.id}_${method}.annotated" + def prefix = "${meta.id}.${method}.annotated" return prefix } }After this fix, all differential-table files in the same directory follow
<contrast_id>.<method>.<kind>.tsv.2. Collapse runs of
_introduced by empty blockingEither:
_from contrast id construction inworkflows/differentialabundance.nf:417(contrast.id = contrast.values().join('_')) —.replaceAll(/_+$/, ''). This is the cleanest fix.__→_at theext.prefixsite.The first option also fixes user-supplied
idfields in YAML/CSV contrast files that happen to end with_.The
.findAll { ... }filters out empty/null fields (the blocking placeholder), which removes the trailing-underscore pathology at its source.3. (Optional) snapshot regen
Both fixes will rename output files. Existing nf-test snapshots that reference the renamed files need regenerating. Run the relevant test profiles and review the diff:
Only the file paths in
stable_nameshould change; content snapshots should remain identical.Verification
After fixes, list any differential output directory and confirm:
__anywhere in filenames (assuming no user-supplied contrast id has a deliberate trailing_)..) between<contrast_id>and<method>for both*.results.tsvand*.annotated.tsv.Acceptance criteria
.annotated.tsvfiles use the same separator before the method name as.results.tsvfiles.Notes for the implementer
Drafted with AI assistance during runtime testing.