Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion modules/bigbio/thermorawfileparser/environment.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
name: thermorawfileparser
channels:
- conda-forge
- bioconda
Expand Down
60 changes: 28 additions & 32 deletions modules/bigbio/thermorawfileparser/main.nf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
process THERMORAWFILEPARSER {
tag "$meta.mzml_id"
tag "$meta.id"
label 'process_low'
label 'process_single'
label 'error_retry'
Expand All @@ -9,54 +9,50 @@ process THERMORAWFILEPARSER {
'https://depot.galaxyproject.org/singularity/thermorawfileparser:1.4.5--h05cac1d_1' :
'biocontainers/thermorawfileparser:1.4.5--h05cac1d_1' }"

stageInMode {
if (task.attempt == 1) {
if (task.executor == "awsbatch") {
'symlink'
} else {
'link'
}
} else if (task.attempt == 2) {
if (task.executor == "awsbatch") {
'copy'
} else {
'symlink'
}
} else {
'copy'
}
}
input:
tuple val(meta), path(rawfile)
tuple val(meta), path(raw)

output:
tuple val(meta), path("*.{mzML,mgf,parquet}"), emit: convert_files
path "versions.yml", emit: versions
tuple val(meta), path("*.{mzML,mzML.gz,mgf,mgf.gz,parquet,parquet.gz}"), emit: spectra
tuple val("${task.process}"), val('thermorawfileparser'), eval("ThermoRawFileParser.sh --version"), emit: versions_thermorawfileparser, topic: versions
path "*.log", emit: log

when:
task.ext.when == null || task.ext.when
Comment on lines +20 to +21

script:
def args = task.ext.args ?: ''
// Default to indexed mzML format (-f=2) if not specified in args
def formatArg = args.contains('-f=') ? '' : '-f=2'
def prefix = task.ext.prefix ?: "${meta.id}"
def suffix = args.contains("--format 0") || args.contains("-f 0") ? "mgf" :
args.contains("--format 1") || args.contains("-f 1") ? "mzML" :
args.contains("--format 2") || args.contains("-f 2") ? "mzML" :
args.contains("--format 3") || args.contains("-f 3") ? "parquet" :
"mzML"
suffix = args.contains("--gzip")? "${suffix}.gz" : "${suffix}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Dynamic prefix/suffix handling is currently disconnected from produced outputs.

At Line 27-33, prefix/suffix are computed but never used by the actual command (Line 36) or output declaration (Line 16). This makes the new logic effectively dead and can break gzip scenarios (*.gz not matched by current output glob). Also, format detection here doesn’t parse documented -f=<n> forms from modules/bigbio/thermorawfileparser/meta.yml:8-16, so suffix resolution can drift from real tool behavior.

Suggested parser hardening for format/gzip detection
-    def suffix = args.contains("--format 0") || args.contains("-f 0") ? "mgf" :
-                args.contains("--format 1") || args.contains("-f 1") ? "mzML" :
-                args.contains("--format 2") || args.contains("-f 2") ? "mzML" :
-                args.contains("--format 3") || args.contains("-f 3") ? "parquet" :
-                "mzML"
-    suffix = args.contains("--gzip")? "${suffix}.gz" : "${suffix}"
+    def suffix = (args =~ /(?:^|\s)(?:--format|-f)(?:=|\s+)0(?:\s|$)/).find() ? "mgf" :
+                (args =~ /(?:^|\s)(?:--format|-f)(?:=|\s+)1(?:\s|$)/).find() ? "mzML" :
+                (args =~ /(?:^|\s)(?:--format|-f)(?:=|\s+)2(?:\s|$)/).find() ? "mzML" :
+                (args =~ /(?:^|\s)(?:--format|-f)(?:=|\s+)3(?:\s|$)/).find() ? "parquet" :
+                "mzML"
+    def gzip = (args =~ /(?:^|\s)--gzip(?:\s|$)/).find()
+    suffix = gzip ? "${suffix}.gz" : suffix

You should also wire prefix/suffix into the produced filename contract (or remove them) and make output patterns include gzipped variants when gzip is enabled.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/bigbio/thermorawfileparser/main.nf` around lines 27 - 33, The
computed prefix and suffix (variables prefix, suffix, using task.ext.prefix and
args) are never applied to the produced filenames or command, and suffix
detection misses -f=<n> forms and gzipped outputs; update the process to (1)
accept both "-f N"/"-f=N" and "--format N"/"--format=N" when resolving suffix in
the suffix logic, (2) use prefix and suffix when constructing the output
filename in the command invocation (the same symbol names prefix and suffix so
wiring is obvious), and (3) update the output declaration/glob to reference the
generated filename pattern and include the optional ".gz" variant (or use the
computed suffix variable) so gzip scenarios are matched by outputs.


"""
ThermoRawFileParser.sh -i='${rawfile}' ${formatArg} ${args} -o=./ 2>&1 | tee '${rawfile.baseName}_conversion.log'

cat <<-END_VERSIONS > versions.yml
"${task.process}":
ThermoRawFileParser: \$(ThermoRawFileParser.sh --version)
END_VERSIONS
ThermoRawFileParser.sh \\
-i='${raw}' \\
${formatArg} \\
${args} \\
-o=./ 2>&1 | tee '${prefix}_conversion.log'
"""

stub:
def prefix = task.ext.prefix ?: "${meta.mzml_id}"
def args = task.ext.args ?: ''
// Determine output format from args, default to mzML
// Format 0 = MGF, formats 1-2 = mzML, format 3 = Parquet, format 4 = None
def outputExt = (args =~ /-f=0\b/).find() ? 'mgf' : 'mzML'
def formatArg = args.contains('-f=') ? '' : '-f=2'
def prefix = task.ext.prefix ?: "${meta.id}"
def suffix = args.contains("--format 0") || args.contains("-f 0") ? "mgf" :
args.contains("--format 1") || args.contains("-f 1") ? "mzML" :
args.contains("--format 2") || args.contains("-f 2") ? "mzML" :
args.contains("--format 3") || args.contains("-f 3") ? "parquet" :
"mzML"
suffix = args.contains("--gzip")? "${suffix}.gz" : "${suffix}"

"""
touch '${prefix}.${outputExt}'
touch '${prefix}.${suffix}'
touch '${prefix}_conversion.log'

cat <<-END_VERSIONS > versions.yml
Expand Down
96 changes: 64 additions & 32 deletions modules/bigbio/thermorawfileparser/meta.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
name: thermorawfileparser
description: Convert RAW file to mzML or MGF files
description: Convert RAW file to mzML or MGF files format
keywords:
- raw
- mzML
- MGF
- OpenMS
- mzml
- mgf
- parquet
- parser
- proteomics
tools:
- thermorawfileparser:
description: |
Expand All @@ -14,36 +16,66 @@ tools:
- `-L` or `--msLevel=VALUE` to select MS levels (e.g., `-L=1,2` or `--msLevel=1-3`)
homepage: https://github.com/compomics/ThermoRawFileParser
documentation: https://github.com/compomics/ThermoRawFileParser
tool_dev_url: https://github.com/compomics/ThermoRawFileParser"
doi: "10.1021/acs.jproteome.9b00328"
licence:
- "Apache Software"
identifier: biotools:ThermoRawFileParser
input:
- meta:
type: map
description: |
Groovy Map containing sample information
- rawfile:
type: file
description: |
Thermo RAW file
pattern: "*.{raw,RAW}"
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'sample1', single_end:false ]`
- raw:
type: file
description: Thermo RAW file
pattern: "*.{raw,RAW}"
ontologies: []
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'sample1', mzml_id:'UPS1_50amol_R3' ]
- convert_files:
type: file
description: |
Converted files in mzML or MGF format depending on the format parameter (-f).
Format options: 0 for MGF, 1 for mzML, 2 for indexed mzML (default), 3 for Parquet, 4 for None.
pattern: "*.{mzML,mgf,parquet}"
- log:
type: file
description: log file
pattern: "*.log"
- versions:
type: file
description: File containing software version
pattern: "versions.yml"
spectra:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'sample1', single_end:false ]`
- "*.{mzML,mzML.gz,mgf,mgf.gz,parquet,parquet.gz}":
type: file
description: Mass spectra in open format
pattern: "*.{mzML,mzML.gz,mgf,mgf.gz,parquet,parquet.gz}"
ontologies: []
versions_thermorawfileparser:
- - ${task.process}:
type: string
description: The process the versions were collected from
- thermorawfileparser:
type: string
description: The name of the tool
- ThermoRawFileParser.sh --version:
type: eval
description: The expression to obtain the version of the tool
log:
- "*.log":
type: file
description: Log file from the conversion process
pattern: "*.log"
ontologies: []
topics:
versions:
- - ${task.process}:
type: string
description: The process the versions were collected from
- thermorawfileparser:
type: string
description: The name of the tool
- ThermoRawFileParser.sh --version:
type: eval
description: The expression to obtain the version of the tool
authors:
- "@jonasscheid"
- "@daichengxin"
- "@ypriverol"
maintainers:
- "@jonasscheid"
- "@daichengxin"
- "@ypriverol"
48 changes: 24 additions & 24 deletions modules/bigbio/thermorawfileparser/tests/main.nf.test.snap
Original file line number Diff line number Diff line change
@@ -1,26 +1,26 @@
{
"versions": {
"content": [
[
"versions.yml:md5,dc9625538c025d615109ef8cac3a86ab"
]
],
"meta": {
"nf-test": "0.9.3",
"nextflow": "25.04.8"
"versions_stub": {
"content": [
[
"versions.yml:md5,dc9625538c025d615109ef8cac3a86ab"
]
],
"timestamp": "2026-03-13T14:32:25.161481",
"meta": {
"nf-test": "0.9.4",
"nextflow": "25.10.4"
}
},
"timestamp": "2025-12-11T06:27:00.000000"
},
"versions_stub": {
"content": [
[
"versions.yml:md5,dc9625538c025d615109ef8cac3a86ab"
]
],
"meta": {
"nf-test": "0.9.3",
"nextflow": "25.04.8"
},
"timestamp": "2025-12-11T06:27:00.000000"
}
}
"versions": {
"content": [
[
"versions.yml:md5,dc9625538c025d615109ef8cac3a86ab"
]
],
"timestamp": "2026-03-13T14:31:40.55121",
"meta": {
"nf-test": "0.9.4",
"nextflow": "25.10.4"
}
}
}
Loading