Skip to content

feat: add wiffconverter container with convert CLI#18

Merged
ypriverol merged 4 commits into
devfrom
feat/wiffconverter-container
Apr 26, 2026
Merged

feat: add wiffconverter container with convert CLI#18
ypriverol merged 4 commits into
devfrom
feat/wiffconverter-container

Conversation

@ypriverol
Copy link
Copy Markdown
Member

@ypriverol ypriverol commented Apr 25, 2026

Summary

  • New wiffconverter-0.10/ container wraps sciex/wiffconverter:0.10 and adds a convert CLI on PATH for SCIEX .wiff → indexed mzML in one step.
  • Always emits indexedmzML (--index is hard-wired); on failure prints a framed banner with the last 40 lines of the converter log and retains <output>.log for diagnosis.
  • CLI style matches the DIA-NN / Relink sister containers: convert --input X --output Y --mode centroid|profile.
  • README updated with usage + always---index guarantee + failure-banner behavior.

Why a wrapper

  • OneOmics.WiffConverter.exe does not always propagate failure via exit code → wrapper validates the output ends </indexedmzML> and has an <indexList> before reporting success.
  • Downstream tools (DIA-NN, OpenMS) require a valid <indexList> for random access, so non-indexed output is always a bug here.
  • Friendly errors so most issues (missing .wiff.scan, locked output, unsupported acquisition) are diagnosable from the console without re-running.

Validation

Tested against PXD073289 (OA_5.wiff + 936 MB .wiff.scan):

  • Success path: 1.4 GB output, 192,076 spectra, ends </indexedmzML>, has <indexList>, success banner printed, log auto-deleted. Result deterministic across two independent runs.
  • Pre-flight error (missing input / no companion .wiff.scan): one-line message, exit 64/66.
  • Mono failure (corrupt .wiff): framed banner + last 40 lines of OpenMcdf.CFFileFormatException traceback, log retained at <output>.log, exit 2.

CI follow-up (separate commit needed)

This branch deliberately omits the build-wiffconverter job in .github/workflows/quantms-containers.yml — the token used to push this branch lacks workflow scope. The diff is ready locally and will be applied in a follow-up commit so a maintainer with workflow permissions can land it.

Test plan

  • CI green (Dockerfile builds with docker build -t ghcr.io/bigbio/wiffconverter:0.10 wiffconverter-0.10/)
  • Smoke test on a SCIEX .wiff → confirm convert: success ... indexed=yes and </indexedmzML> tail
  • Smoke test failure path → confirm banner + retained log
  • Apply the workflow CI job in a follow-up commit (needs workflow scope)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation

    • Added WiffConverter docs, usage for Docker/Singularity, container metadata, and updated license terms for SCIEX redistributable.
  • New Features

    • Ship a containerized WiffConverter with a user-facing convert command that produces indexed .mzML (default centroid mode), optional logging, and concise success output.
  • Bug Fixes / Reliability

    • Improved failure reporting with a clear banner and tail of recent log lines for easier troubleshooting.
  • Chores

    • Adjusted CI/CD ordering to build/push WiffConverter before syncing other containers.

Introduces a new container at `wiffconverter-0.10/` that wraps
`sciex/wiffconverter:0.10` and exposes a single `convert` CLI on PATH for
SCIEX `.wiff` (+ companion `.wiff.scan`) → indexed mzML conversion in one
step. Designed for use by quantmsdiann to ingest AbSciex data natively
without a separate indexing pass.

Why a wrapper, not the upstream entrypoint:

- `OneOmics.WiffConverter.exe` does not always propagate failure via its
  exit code, so the wrapper validates the output ends `</indexedmzML>`
  and contains an `<indexList>` element.
- The output must be `indexedmzML` for downstream tools (DIA-NN, OpenMS),
  so the wrapper always passes `--index` — there is no opt-out flag.
- A `convert --input X --output Y --mode centroid|profile` interface
  matches the DIA-NN / Relink containers' style (binary on PATH,
  flag-driven) and replaces the original positional `wiff-to-mzml`
  prototype.
- On failure the wrapper prints a framed banner with input/output/mode
  and the last 40 lines of the underlying converter log, and retains the
  log file at `<output>.log` so the operator can diagnose without
  re-running. On success the log is auto-deleted unless `--log <path>`
  was passed.

Validated end-to-end on PXD073289 / OA_5 (936 MB `.wiff.scan`):
192,076 spectra, 1.4 GB indexed mzML output, deterministic across runs.
Failure path validated against a corrupt `.wiff` (`OpenMcdf` signature
exception) — banner + log retention behave as designed.

README documents the `convert` CLI, the always-`--index` guarantee, and
the failure-banner behavior. The CI workflow change required to add a
`build-wiffconverter` job is left for a separate commit (token used here
lacks `workflow` scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 25, 2026

Warning

Rate limit exceeded

@ypriverol has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 56 minutes and 15 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 56 minutes and 15 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 387cfff3-57b2-47ee-95b2-8933f5d1195a

📥 Commits

Reviewing files that changed from the base of the PR and between 74d7ac8 and ad51b03.

📒 Files selected for processing (1)
  • .github/workflows/quantms-containers.yml
📝 Walkthrough

Walkthrough

Adds a new WiffConverter container (v0.10): README docs, a Dockerfile extending sciex/wiffconverter:0.10 with labels and a convert CLI, and a Bash convert entrypoint that runs OneOmics.WiffConverter via Mono with input validation, indexed mzML output, and enhanced failure diagnostics.

Changes

Cohort / File(s) Summary
Documentation
README.md
Add WiffConverter overview, usage (Docker/Singularity examples), license note for SCIEX redistributable terms, container metadata in technical specs, and CI/CD step reordering to build/push WiffConverter before syncing OpenMS containers.
Container Configuration
wiffconverter-0.10/Dockerfile
New Dockerfile extending sciex/wiffconverter:0.10; adds descriptive labels, build-time checks for mono and OneOmics.WiffConverter.exe, installs convert CLI to /usr/local/bin, clears ENTRYPOINT and sets /bin/bash CMD with /data/ workdir.
CLI Entrypoint
wiffconverter-0.10/convert
New executable Bash script parsing --input, --output, --mode (default centroid), --log; checks input and companion .wiff.scan, invokes mono OneOmics.WiffConverter.exe --index --overwrite -<mode>, tees logs, validates output (presence, </indexedmzML>, <indexList>), prints diagnostics with last 40 log lines on failure, removes auto-log on success, and emits concise success summary.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as convert CLI
    participant FS as Filesystem
    participant Mono as Mono Runtime
    participant OneOmics as OneOmics.WiffConverter.exe

    User->>CLI: run convert --input file.wiff --output out.mzML --mode centroid
    CLI->>CLI: parse args & validate presence of file.wiff and file.wiff.scan
    CLI->>FS: verify input files exist
    alt inputs missing
        CLI->>FS: write diagnostics log
        CLI->>User: print diagnostics & exit 2
    else inputs present
        CLI->>Mono: mono OneOmics.WiffConverter.exe --index --overwrite -centroid ...
        Mono->>OneOmics: execute conversion
        OneOmics->>FS: write indexed mzML output
        CLI->>FS: read output file
        CLI->>CLI: validate output (non-empty, contains <indexList>, ends with </indexedmzML>)
        alt validation fails
            CLI->>FS: tail last 40 log lines
            CLI->>User: print failure details & exit 2
        else validation succeeds
            CLI->>FS: remove auto log if none provided
            CLI->>User: print success summary (size, indexed=yes)
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

Review effort [1-5]: 2

Poem

🐰 I hopped into the build today,
I wrapped OneOmics in Dockerplay,
With Mono's hum and logs in tow,
I index mzML nice and slow,
Hop — convert done! — into the data fray.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add wiffconverter container with convert CLI' directly and accurately describes the main change: introducing a new wiffconverter-0.10 container with a convert CLI wrapper, which is the primary purpose of the changeset across README, Dockerfile, and convert script.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/wiffconverter-container

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
wiffconverter-0.10/convert (1)

75-81: Consider creating the output directory before invoking mono.

If --output points at a path whose parent directory doesn't exist (e.g. a typo, or a Nextflow staging path that hasn't been created), mono will fail with a generic error and the user has to dig through the framed log to figure out what happened. A one-line mkdir -p is cheap and turns this into a non-issue.

♻️ Suggested addition before the mono invocation
+mkdir -p "$(dirname "$OUTPUT")"
+
 # Make sure mono's stdout/stderr is captured to the log; tee it for live progress.
 status=0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wiffconverter-0.10/convert` around lines 75 - 81, Before invoking mono (the
block running /usr/bin/mono "$WIFF_BIN" ... > >(tee "$LOG") 2>&1), ensure the
directory for the OUTPUT path exists by creating its parent directory (use the
OUTPUT variable to derive the parent) with mkdir -p; add this one-line
check/creation immediately before the mono call so that missing parent
directories are created and mono fails with a clear error only for real runtime
issues.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 161-167: The README references a stale wrapper name "wiff-to-mzml"
that doesn't exist in the image; update the WiffConverter Container bullets to
reflect the actual installed CLI (/usr/local/bin/convert) and its invocation
name "convert", and ensure this matches the usage example elsewhere in README.md
(the install step that places the wrapper at /usr/local/bin/convert and any code
blocks invoking convert).
- Around line 269-272: The README currently claims "Builds and pushes
WiffConverter Docker and Singularity containers" but the workflow
`.github/workflows/quantms-containers.yml` contains only `build-diann` and
`build-relink` (and `sync-openms` depends on those), so update README.md to
reflect reality by either removing the WiffConverter line or marking the third
item ("Builds and pushes WiffConverter Docker and Singularity containers") as
"(planned)"/"(to be added)" until the `build-wiffconverter` job is added; ensure
the text mentions the actual workflow jobs `build-diann`, `build-relink`, and
`sync-openms` so the docs and workflow are consistent.

In `@wiffconverter-0.10/convert`:
- Around line 75-102: The current process substitution using >(...) with tee is
racy because the tee subshell may still be flushing when we check $status and
call fail(); replace it with a real pipe and capture mono's exit code via
PIPESTATUS (e.g., /usr/bin/mono ... 2>&1 | tee "$LOG"; status=${PIPESTATUS[0]})
so tee runs synchronously and $status reflects mono's exit, or alternatively
capture the PID of the tee subprocess when using the process substitution and
call wait on that PID before checking $status and invoking fail(); adjust the
mono invocation and the status-checking logic (the status variable, mono command
block, and the if [[ $status -ne 0 ]] ... fail() path) accordingly.

---

Nitpick comments:
In `@wiffconverter-0.10/convert`:
- Around line 75-81: Before invoking mono (the block running /usr/bin/mono
"$WIFF_BIN" ... > >(tee "$LOG") 2>&1), ensure the directory for the OUTPUT path
exists by creating its parent directory (use the OUTPUT variable to derive the
parent) with mkdir -p; add this one-line check/creation immediately before the
mono call so that missing parent directories are created and mono fails with a
clear error only for real runtime issues.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bd2d1684-a5ac-450f-8b6d-1cd6d7210b79

📥 Commits

Reviewing files that changed from the base of the PR and between 15ee93b and 27d7d22.

📒 Files selected for processing (3)
  • README.md
  • wiffconverter-0.10/Dockerfile
  • wiffconverter-0.10/convert

Comment thread README.md
Comment thread README.md
Comment on lines 269 to +272
1. Builds and pushes DIA-NN Docker and Singularity containers (all versions)
2. Builds and pushes Relink Docker and Singularity containers
3. Syncs OpenMS containers from the official repository to BigBio
3. Builds and pushes WiffConverter Docker and Singularity containers
4. Syncs OpenMS containers from the official repository to BigBio
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

CI/CD documentation is ahead of the actual workflow.

Per the workflow file (.github/workflows/quantms-containers.yml), the only build jobs are build-diann and build-relink, and sync-openms depends only on [build-diann, build-relink]. There is no build-wiffconverter job, so step 3 ("Builds and pushes WiffConverter Docker and Singularity containers") is not yet true. The PR description acknowledges this is intentional (workflow scope on the push token), with the workflow change deferred to a follow-up commit.

Until that follow-up lands, consider adjusting this list to avoid documenting behavior that doesn't exist yet — e.g. mark step 3 as "(planned)" or drop it from this PR and add it together with the workflow change.

📝 Suggested wording until the workflow job is added
 1. Builds and pushes DIA-NN Docker and Singularity containers (all versions)
 2. Builds and pushes Relink Docker and Singularity containers
-3. Builds and pushes WiffConverter Docker and Singularity containers
-4. Syncs OpenMS containers from the official repository to BigBio
+3. Syncs OpenMS containers from the official repository to BigBio
+
+_WiffConverter (`wiffconverter-0.10/`) is built manually for now; an automated
+`build-wiffconverter` job will be added in a follow-up PR._
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
1. Builds and pushes DIA-NN Docker and Singularity containers (all versions)
2. Builds and pushes Relink Docker and Singularity containers
3. Syncs OpenMS containers from the official repository to BigBio
3. Builds and pushes WiffConverter Docker and Singularity containers
4. Syncs OpenMS containers from the official repository to BigBio
1. Builds and pushes DIA-NN Docker and Singularity containers (all versions)
2. Builds and pushes Relink Docker and Singularity containers
3. Syncs OpenMS containers from the official repository to BigBio
_WiffConverter (`wiffconverter-0.10/`) is built manually for now; an automated
`build-wiffconverter` job will be added in a follow-up PR._
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 269 - 272, The README currently claims "Builds and
pushes WiffConverter Docker and Singularity containers" but the workflow
`.github/workflows/quantms-containers.yml` contains only `build-diann` and
`build-relink` (and `sync-openms` depends on those), so update README.md to
reflect reality by either removing the WiffConverter line or marking the third
item ("Builds and pushes WiffConverter Docker and Singularity containers") as
"(planned)"/"(to be added)" until the `build-wiffconverter` job is added; ensure
the text mentions the actual workflow jobs `build-diann`, `build-relink`, and
`sync-openms` so the docs and workflow are consistent.

Comment thread wiffconverter-0.10/convert Outdated
@ypriverol
Copy link
Copy Markdown
Member Author

Related with issue bigbio/quantmsdiann#71

claude and others added 2 commits April 26, 2026 06:01
- convert: replace `> >(tee ...)` process substitution with a regular
  pipe + PIPESTATUS so the script waits for tee to flush before
  fail() tails the log or success removes it (no orphan logs, no
  truncated failure banners).
- README: the WiffConverter container ships a `convert` wrapper, not
  `wiff-to-mzml`.
fix: avoid tee race in convert and correct README wrapper name
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
wiffconverter-0.10/convert (2)

69-86: Optional: clean up partial output/log on interrupt.

If the user Ctrl-C's during a long conversion (or the container is killed), a partially-written $OUTPUT plus a $LOG (when auto-named) are left on disk. A small trap before launching mono keeps the workspace tidy and mirrors the success-path log cleanup.

♻️ Suggested trap
 KEEP_LOG=1
 if [[ -z "$LOG" ]]; then
     LOG="${OUTPUT}.log"
     KEEP_LOG=0  # only retain on failure
 fi
+
+# On interrupt/termination, drop the half-written mzML; keep the log for diagnosis.
+cleanup_on_signal() {
+    rm -f "$OUTPUT"
+    exit 130
+}
+trap cleanup_on_signal INT TERM
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wiffconverter-0.10/convert` around lines 69 - 86, Add a trap before launching
mono that removes the partially-written "$OUTPUT" and auto-named "$LOG" if the
script is interrupted or killed: set a trap for INT TERM EXIT that checks
KEEP_LOG==0 and then unlinks "$OUTPUT" and "$LOG" if they exist; ensure the trap
is cleared (trap - INT TERM EXIT) immediately after the mono pipeline completes
successfully (before the existing success-path log cleanup) so successful runs
keep their outputs/logs. Reference the variables LOG, OUTPUT, KEEP_LOG and the
mono/tee pipeline where PIPESTATUS is captured to locate where to add and clear
the trap.

120-123: Relax the <indexList match to tolerate whitespace and attribute-less forms.

grep -q "<indexList " requires a literal space immediately after the tag name. mzML in practice uses <indexList count="…">, but the check would miss perfectly legal variants such as <indexList\t…> or a hypothetical <indexList> (without attributes). A small character-class makes it robust without weakening intent.

♻️ Proposed fix
-if ! grep -q "<indexList " "$OUTPUT"; then
+if ! grep -qE '<indexList[[:space:]>]' "$OUTPUT"; then
     fail "output is missing <indexList> element"
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wiffconverter-0.10/convert` around lines 120 - 123, The current sanity check
in the convert script uses grep -q "<indexList " which only matches a literal
space; replace it with a whitespace-tolerant pattern and use extended regex
(e.g. change the check to use grep -qE '<indexList([[:space:]]|>)' or
equivalent) so it matches `<indexList>` and `<indexList` followed by any
whitespace or attributes; update the grep invocation in the block containing the
indexList check accordingly (the line to change is the grep in the convert
script).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@wiffconverter-0.10/convert`:
- Around line 69-86: Add a trap before launching mono that removes the
partially-written "$OUTPUT" and auto-named "$LOG" if the script is interrupted
or killed: set a trap for INT TERM EXIT that checks KEEP_LOG==0 and then unlinks
"$OUTPUT" and "$LOG" if they exist; ensure the trap is cleared (trap - INT TERM
EXIT) immediately after the mono pipeline completes successfully (before the
existing success-path log cleanup) so successful runs keep their outputs/logs.
Reference the variables LOG, OUTPUT, KEEP_LOG and the mono/tee pipeline where
PIPESTATUS is captured to locate where to add and clear the trap.
- Around line 120-123: The current sanity check in the convert script uses grep
-q "<indexList " which only matches a literal space; replace it with a
whitespace-tolerant pattern and use extended regex (e.g. change the check to use
grep -qE '<indexList([[:space:]]|>)' or equivalent) so it matches `<indexList>`
and `<indexList` followed by any whitespace or attributes; update the grep
invocation in the block containing the indexList check accordingly (the line to
change is the grep in the convert script).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fc026b3e-e45f-4b40-816c-ff37b4cf932a

📥 Commits

Reviewing files that changed from the base of the PR and between 27d7d22 and 74d7ac8.

📒 Files selected for processing (2)
  • README.md
  • wiffconverter-0.10/convert
🚧 Files skipped from review as they are similar to previous changes (1)
  • README.md

Wires wiffconverter-0.10 through the existing pipeline mirroring the
relink job:

- adds `wiffconverter-*/**` to PR path filters
- adds `wiffconverter_0_10` paths-filter rule + `CHG_WC010` env var
- emits `wiffconverter_matrix` / `has_wiffconverter` from detect-changes
- new `build-wiffconverter` job runs after build-relink, gated to the
  bigbio org (matches relink), pushes to ghcr.io/bigbio/wiffconverter
  and the wiffconverter-sif Singularity tag, with `:latest` on release
- adds build-wiffconverter to sync-openms `needs:`
@ypriverol ypriverol changed the base branch from main to dev April 26, 2026 06:12
@ypriverol ypriverol merged commit a5fe547 into dev Apr 26, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants