Skip to content

Fix nextflow info -d parsing in LocalWorkflow and preserve local_path path semantics#4302

Open
Copilot wants to merge 3 commits into
devfrom
copilot/fix-nextflow-info-parsing
Open

Fix nextflow info -d parsing in LocalWorkflow and preserve local_path path semantics#4302
Copilot wants to merge 3 commits into
devfrom
copilot/fix-nextflow-info-parsing

Conversation

Copy link
Copy Markdown

Copilot AI commented May 26, 2026

LocalWorkflow.get_local_nf_workflow_details() parsed nextflow info -d output from str(bytes), which turns newlines into literal \n and can cause regex over-capture. This path also needed to ensure local_path remains path-typed for downstream path joins.

  • Decode command output before regex parsing

    • In nf_core/pipelines/list.py, decode nfinfo_raw with .decode() and run regex on the decoded string instead of str(bytes).
  • Keep local_path as Path

    • Continue mapping the local path capture to Path(...) (and resolving via _resolve_wf_path) so later path operations remain type-safe.
  • Add regression coverage for this code path

    • In tests/pipelines/test_list.py, added a focused test that mocks nf_core.utils.run_cmd to return byte output and asserts:
      • repository is parsed correctly
      • local_path is parsed to the expected Path value
nfinfo_raw, _ = result
nfinfo_str = nfinfo_raw.decode()
re_patterns = {"repository": r"repository\s*: (.*)", "local_path": r"local path\s*: (.*)"}
for key, pattern in re_patterns.items():
    m = re.search(pattern, nfinfo_str)
    if m:
        value = Path(m.group(1)) if key == "local_path" else m.group(1)
Original prompt

Bug

In nf_core/pipelines/list.py, LocalWorkflow.get_local_nf_workflow_details() runs nextflow info -d and then parses its output with a regex:

result = nf_core.utils.run_cmd("nextflow", f"info -d {self.full_name}")
if result is not None:
    nfinfo_raw, _ = result
    re_patterns = {"repository": r"repository\s*: (.*)", "local_path": r"local path\s*: (.*)"}
    for key, pattern in re_patterns.items():
        m = re.search(pattern, str(nfinfo_raw))
        if m:
            setattr(self, key, m.group(1))

Problem 1 — str(bytes) mangles newlines

nf_core.utils.run_cmd returns bytes. Calling str(bytes_obj) in Python does not decode the bytes — it produces a repr-like string such as:

b'local path  : /home/user/.nextflow/assets/nf-core/rnaseq\nrepository  : ...'

The \n in that string is a literal backslash + n (two characters), not a real newline. Because . in a regex matches any character except a real newline, (.*) greedily captures everything from local path: to the end of the entire string, including all subsequent lines. The captured value for local_path will be wrong/garbage.

Calling .decode() instead gives a proper Python str with real newline characters, so (.*) stops correctly at end-of-line.

Problem 2 — local_path set as str instead of Path

self.local_path is typed as Path | None and is subsequently used in path-join operations (e.g. self.local_path / ".git" / "FETCH_HEAD"). Setting it via setattr(self, key, m.group(1)) stores a plain str, which will cause a TypeError on those / operations.

Fix

Replace the relevant block (roughly lines 370–380 of nf_core/pipelines/list.py) with:

result = nf_core.utils.run_cmd("nextflow", f"info -d {self.full_name}")
if result is not None:
    nfinfo_raw, _ = result
    nfinfo_str = nfinfo_raw.decode()
    re_patterns = {"repository": r"repository\s*: (.*)", "local_path": r"local path\s*: (.*)"}
    for key, pattern in re_patterns.items():
        m = re.search(pattern, nfinfo_str)
        if m:
            setattr(self, key, Path(m.group(1)) if key == "local_path" else m.group(1))

Changes:

  1. Call .decode() on the raw bytes to get a proper Python string before applying the regex.
  2. Wrap the captured value in Path(...) when the key is local_path, keeping the type consistent with the rest of the class.

Please also add or update any relevant tests if they exist for this code path.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: ```
# Use nextflow info to get more details about the workflow
else:
result = nf_core.utils.run_cmd("nextflow", f"info -d {self.full_name}")
if result is not None:
nfinfo_raw, _ = result
nfinfo_str = nfinfo_raw.decode()
re_patterns = {"repository": r"repository\s*: (.)", "local_path": r"local path\s: (.*)"}
for key, pattern in re_patterns.items():
m = re.search(pattern, nfinfo_str)
if m:
setattr(self, key, Path(m.group(1)) if key == "local_path" else m.group(1))

I think this has a bug!

list.py: str(nfinfo_raw) → nfinfo_raw.decode(). Python's str(bytes) represents newlines as literal \n (two chars), causing (.*) to capture everything past "local path: " including all subsequent lines. .decode() gives a real string with actual newlines that the regex stops at. Also now sets local_path as Path(...) so the subsequent / operations on it don't TypeError.

I guess one could patch like this my pasted content.
What do you think? If you agree, please open a PR to dev


Assistant: The bug analysis is spot on:

1. **`str(bytes)`** in Python produces something like `b'local path: /foo\nrepository: ...'` — the regex `(.*)` then captures everything after `local path: ` because `\n` is represented as a literal backslash-n (two chars), not a real newline that `.` stops at.
2. **`.decode()`** gives a proper Python string with real newlines, so `(.*)` correctly stops at end-of-line.
3. **`Path(m.group(1))`** for `local_path` is the right type since `self.local_path` is typed as `Path | None` and later used in path operations like `/`.

Let me open that PR to `dev`.

Assistant: We're waiting for your response to the confirmation dialog to proceed with opening the pull request to `nf-core/tools`.


This pull request was created from Copilot chat.

Copilot AI changed the title [WIP] Fix regex parsing for nextflow info output Fix nextflow info -d parsing in LocalWorkflow and preserve local_path path semantics May 26, 2026
Copilot AI requested a review from jpfeuffer May 26, 2026 19:57
@jpfeuffer jpfeuffer marked this pull request as ready for review May 26, 2026 20:01
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.42%. Comparing base (f2f69f4) to head (a685596).

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants