feat: Add mmseqs2 main workflows #3981

fgvieira · 2025-04-04T13:38:24Z

QC

I confirm that I have followed the documentation for contributing to snakemake-wrappers.

While the contributions guidelines are more extensive, please particularly ensure that:

test.py was updated to call any added or updated example rules in a Snakefile
input: and output: file paths in the rules can be chosen arbitrarily
wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:)
temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to
the meta.yaml contains a link to the documentation of the respective tool or command under url:
conda environments use a minimal amount of channels and packages, in recommended ordering

Summary by CodeRabbit

New Features
- Added MMseqs2 wrappers and workflow rules covering search, clustering, linclust, taxonomy, RBH and DB creation; workflow metadata and parameters.
Tests
- Added end-to-end tests, numerous expected outputs, and database creation fixtures.
Documentation
- Added SILVA export README and workflow metadata for discoverability.
Chores
- Added pinned and YAML Conda environment specs for reproducible installs.

coderabbitai · 2025-04-04T13:38:33Z

📝 Walkthrough

Walkthrough

Adds MMseqs2 Snakemake wrappers, workflow and DB metadata, Conda environment specs (YAML and linux-64 pin files), test Snakefiles, a test runner, and numerous static test fixtures/expected outputs for DB creation, search, clustering, linclust, taxonomy, and RBH.

Changes

Cohort / File(s)	Summary
Conda environments `bio/mmseqs2/db/environment.yaml`, `bio/mmseqs2/db/environment.linux-64.pin.txt`, `bio/mmseqs2/workflows/environment.yaml`, `bio/mmseqs2/workflows/environment.linux-64.pin.txt`	New Conda environment YAMLs and explicit linux-64 pin files listing exact package URLs (conda-forge, bioconda), pinning mmseqs2, snakemake-wrapper-utils and runtime libraries.
Metadata / Manifests `bio/mmseqs2/db/meta.yaml`, `bio/mmseqs2/workflows/meta.yaml`	New metadata files declaring name, url, description, authors, I/O schema and params (module, extra).
Wrappers (runtime wrappers) `bio/mmseqs2/db/wrapper.py`, `bio/mmseqs2/workflows/wrapper.py`	New Snakemake wrapper modules that normalize inputs/outputs, assemble command-line args (module, extra, threads, tmpdir), special-case DB modules, and execute mmseqs2 via shell; workflow wrapper includes module-level metadata.
DB tests — rules & input `bio/mmseqs2/db/test/Snakefile`, `bio/mmseqs2/db/test/seqs/a.fasta`	New test Snakefile with rules `mmseqs2_databases` and `mmseqs2_createdb`, plus a small FASTA used as input.
DB tests — expected (createdb/databases) `bio/mmseqs2/db/test/expected/createdb/`, `bio/mmseqs2/db/test/expected/databases/`	Static expected outputs for createdb/databases (index, lookup, source, version/README, _h.* files).
Workflow rules (tests) `bio/mmseqs2/workflows/test/Snakefile`	New workflow test rules covering search, cluster, linclust, taxonomy, and rbh with multiext outputs, logs, params and wrapper references.
Workflow DB fixtures `bio/mmseqs2/workflows/test/db/*`	Static DB fixture files (`a.index`, `a.lookup`, `a.source`, `a_h.index`, `a_mapping`).
Workflow expected outputs `bio/mmseqs2/workflows/test/expected/cluster/`, `bio/mmseqs2/workflows/test/expected/linclust/`, `bio/mmseqs2/workflows/test/expected/search/a.tab`, `bio/mmseqs2/workflows/test/expected/rbh/a.tab`, `bio/mmseqs2/workflows/test/expected/taxonomy/a_report`	Expected FASTA files, representative sequences, alignment/tab outputs and a taxonomy report used by workflow tests.
Top-level tests `test_wrappers.py`	New test `test_mmseqs2(run)` that runs both the workflows and db test suites and compares results with expected fixtures.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Rule as Snakemake Rule
  participant Wrapper as MMseqs2 Wrapper
  participant MM as mmseqs2 CLI
  participant FS as Filesystem

  Rule->>Wrapper: invoke(inputs, params(module, extra), threads, log)
  Note over Wrapper: normalize inputs/outputs\nresolve common prefixes\nconfigure tmpdir/threads/extra
  Wrapper->>MM: mmseqs2 <module> <query> <target?> <output> --threads N <extra> (uses tmpdir)
  MM->>FS: read input files
  MM-->>FS: write outputs (DBs, tabs, FASTA, reports)
  MM-->>Wrapper: exit status
  Wrapper-->>Rule: write log, expose outputs

sequenceDiagram
  autonumber
  participant RuleDB as Snakemake DB Rule
  participant DBWrapper as MMseqs2 DB Wrapper
  participant MM as mmseqs2 CLI
  participant FS as Filesystem

  RuleDB->>DBWrapper: invoke(seqs input, params, threads, log)
  Note over DBWrapper: special-case modules:\n- databases: append thread flags\n- createdb: disable tmpdir
  DBWrapper->>MM: mmseqs2 <module> <in> <out> [--threads N] <extra>
  MM->>FS: read seqs
  MM-->>FS: emit DB artifacts (.index, .lookup, .source, _h.*)
  MM-->>DBWrapper: exit status
  DBWrapper-->>RuleDB: log and outputs

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Review focus suggestions:
- bio/mmseqs2/db/wrapper.py and bio/mmseqs2/workflows/wrapper.py (input/output normalization, tmpdir handling, thread flags)
- Snakefiles in bio/mmseqs2/db/test/ and bio/mmseqs2/workflows/test/ (correct multiext outputs and log paths)
- test_wrappers.py (test invocation and expected-vs-actual comparisons)

Suggested reviewers

johanneskoester

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description Check	❓ Inconclusive	The PR description includes a fully completed QC checklist with all eight items properly checked, confirming the author's adherence to the snakemake-wrappers contributing documentation and verification of key requirements such as test.py updates, arbitrary I/O paths, automatic argument inference, proper temporary file handling, meta.yaml documentation links, and minimal conda environments. However, the description section itself (marked "Add a description of your PR here") is entirely empty with no explanatory text about what mmseqs2 workflows are being added or what the changes accomplish, leaving this critical section of the template unfilled despite the comprehensive QC checklist completion.	To resolve this, please add a descriptive paragraph explaining what mmseqs2 workflows have been added to the repository, which modules are included (e.g., easy-search, easy-cluster, easy-linclust, easy-taxonomy, easy-rbh), and a brief summary of the implementation approach. The QC checklist items are properly verified, but the descriptive text section should be filled out to provide reviewers with context about the changes beyond just the compliance checklist.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title "feat: Add mmseqs2 main workflows" follows conventional commit style with the "feat:" prefix and clearly summarizes the main change in the changeset. The title accurately reflects the primary objective of adding mmseqs2 workflows to the repository, as evidenced by the extensive workflow additions including new wrapper modules, test files, metadata configurations, and five new Snakemake rules for workflow operations (search, cluster, linclust, taxonomy, and rbh). The title is concise and specific enough that a teammate reviewing the history would understand the main contribution.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 949d65d and 1c07285.

📒 Files selected for processing (1)

test_wrappers.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

test_wrappers.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: docs
GitHub Check: testing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (10)

bio/mmseqs2/db/test/seqs/a.fasta (1)

1-8: Confirm intentional duplicate FASTA IDs.

Headers “>1” appear twice. If this is deliberate to exercise lookup behavior, all good; otherwise consider making IDs unique to avoid tool-dependent collisions.

bio/mmseqs2/workflows/environment.linux-64.pin.txt (1)

1-44: Explicit pin looks consistent; verify CI can consume it as-is.

mmseqs2 and wrapper-utils are pinned; good for reproducibility. Please ensure CI uses conda/mamba versions that support @explicit with these URLs. Also note OpenSSL/libstdc++ versions differ from the db pin; that’s acceptable but consider aligning if unnecessary drift causes solver differences.

Note: Based on prior repo conventions I’ve learned, ensure environment.yaml (separate file) uses spaces around “=” in pins if updated.

bio/mmseqs2/db/test/expected/databases/a.version (1)

1-165: Verify licensing for bundled SILVA text; consider trimming the fixture.

This embeds substantial third‑party documentation text. Please confirm redistribution is permitted under its license. If the test doesn’t require full content, prefer a minimal synthetic placeholder or assert only on sentinel lines to avoid bundling large copyrighted text.

bio/mmseqs2/workflows/test/db/a.index (1)

1-4: Optional: de-duplicate identical fixtures.

Consider referencing a single canonical a.index fixture (or symlink) to avoid drift between workflow and db tests.
bio/mmseqs2/db/meta.yaml (1)
3-13: Polish description and I/O/param wording for clarity and consistency.

Capitalize the description; prefer “FASTA” over “FAS”; clarify “module” to reflect the MMseqs2 component being run.
 description: |
-  ultra fast and sensitive sequence search and clustering suite
+  Ultra-fast and sensitive sequence search and clustering suite
 input:
-  - input FAS file
+  - input FASTA file
 output:
-  - output: DB files
+  - output: MMseqs2 DB files
 params:
-  - module: workflow to use
+  - module: MMseqs2 module to run (e.g., createdb)
   - extra: additional program arguments
bio/mmseqs2/workflows/meta.yaml (1)
3-14: Clarify description and I/O wording; name “FASTA” explicitly; refine param help.

Minor copyedits for consistency with repo conventions.
 description: |
-  ultra fast and sensitive sequence search and clustering suite
+  Ultra-fast and sensitive sequence search and clustering suite
 input:
-  - query: input query FAS file(s)
-  - target: input target FAS file(s) or DB
+  - query: input query FASTA file(s)
+  - target: input target FASTA file(s) or MMseqs2 DB
 output:
-  - output: FAS, cluster or DB file(s)
+  - output: FASTA, cluster, or DB file(s)
 params:
-  - module: workflow to use
+  - module: MMseqs2 workflow to run (e.g., cluster, linclust, search)
   - extra: additional program arguments
bio/mmseqs2/workflows/environment.yaml (1)
5-7: Unpin snakemake-wrapper-utils; mmseqs2 17.b804f confirmed on Bioconda

Per repo convention pin only the main tool — keep mmseqs2 pinned, remove the snakemake-wrapper-utils pin.
 dependencies:
   - mmseqs2 =17.b804f
-  - snakemake-wrapper-utils =0.8.0
+  - snakemake-wrapper-utils
test_wrappers.py (1)
153-166: Remove duplicate key in compare_results_with_expected.

The key "out/cluster/a_b_cluster.tsv" appears twice; the second overwrites the first.

Apply this diff:
         compare_results_with_expected={
             "out/search/a.fas": "expected/search/a.fas",
             "out/cluster/a_b_cluster.tsv": "expected/cluster/a_b_cluster.tsv",
             "out/cluster/a_b_rep_seq.fasta": "expected/cluster/a_b_rep_seq.fasta",
             "out/cluster/a_b_all_seqs.fasta": "expected/cluster/a_b_all_seqs.fasta",
-            "out/cluster/a_b_cluster.tsv": "expected/cluster/a_b_cluster.tsv",
             "out/linclust/a_b_rep_seq.fasta": "expected/linclust/a_b_rep_seq.fasta",
             "out/linclust/a_b_all_seqs.fasta": "expected/linclust/a_b_all_seqs.fasta",
             "out/linclust/a_b_cluster.tsv": "expected/linclust/a_b_cluster.tsv",
             "out/taxonomy/a_tophit_report": "expected/taxonomy/a_tophit_report",
             "out/taxonomy/a_tophit_aln": "expected/taxonomy/a_tophit_aln",
             "out/taxonomy/a_report": "expected/taxonomy/a_report",
             "out/taxonomy/a_lca.tsv": "expected/taxonomy/a_lca.tsv",
             "out/rbh/a.fas": "expected/rbh/a.fas",
         },
bio/mmseqs2/db/wrapper.py (1)
23-31: Make module checks robust when params.module includes arguments.

params.module can contain arguments (e.g., "databases SILVA"), so equality checks miss "databases". Split before checking; also avoid passing a dummy tmpdir for modules not requiring it.

Apply this diff:
 with tempfile.TemporaryDirectory() as tmpdir:
-    # Modules with threads
-    if snakemake.params.module in ["databases"]:
-        extra = f"--threads {snakemake.threads} {extra}"
-    # Modules with no temp folder
-    if snakemake.params.module in ["createdb"]:
-        tmpdir = ""
-
-    shell("mmseqs {snakemake.params.module} {input} {out} {tmpdir} {extra} {log}")
+    module = snakemake.params.module
+    module_name = module.split()[0]
+    # Modules with threads
+    if module_name == "databases":
+        extra = f"--threads {snakemake.threads} {extra}"
+    # Modules with no temp folder
+    tmp_arg = "" if module_name == "createdb" else tmpdir
+
+    shell("mmseqs {module} {input} {out} {tmp_arg} {extra} {log}")
bio/mmseqs2/workflows/wrapper.py (1)
14-17: Use the local variable for target prefixing.

Minor clarity fix: use the already-resolved target when getting the common prefix.

Apply this diff:
 target = snakemake.input.get("target", "")
 if isinstance(target, list):
-    target = os.path.commonprefix(snakemake.input.target)
+    target = os.path.commonprefix(target)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 168b48b and 187892a.

⛔ Files ignored due to path filters (5)

bio/mmseqs2/workflows/test/expected/cluster/a_b_cluster.tsv is excluded by !**/*.tsv
bio/mmseqs2/workflows/test/expected/linclust/a_b_cluster.tsv is excluded by !**/*.tsv
bio/mmseqs2/workflows/test/expected/taxonomy/a_lca.tsv is excluded by !**/*.tsv
bio/mmseqs2/workflows/test/seqs/nucl.a.fas.gz is excluded by !**/*.gz
bio/mmseqs2/workflows/test/seqs/nucl.b.fas.gz is excluded by !**/*.gz

📒 Files selected for processing (30)

bio/mmseqs2/db/environment.linux-64.pin.txt (1 hunks)
bio/mmseqs2/db/environment.yaml (1 hunks)
bio/mmseqs2/db/meta.yaml (1 hunks)
bio/mmseqs2/db/test/Snakefile (1 hunks)
bio/mmseqs2/db/test/expected/createdb/a.index (1 hunks)
bio/mmseqs2/db/test/expected/createdb/a.lookup (1 hunks)
bio/mmseqs2/db/test/expected/createdb/a.source (1 hunks)
bio/mmseqs2/db/test/expected/createdb/a_h.index (1 hunks)
bio/mmseqs2/db/test/expected/databases/a.source (1 hunks)
bio/mmseqs2/db/test/expected/databases/a.version (1 hunks)
bio/mmseqs2/db/test/seqs/a.fasta (1 hunks)
bio/mmseqs2/db/wrapper.py (1 hunks)
bio/mmseqs2/workflows/environment.linux-64.pin.txt (1 hunks)
bio/mmseqs2/workflows/environment.yaml (1 hunks)
bio/mmseqs2/workflows/meta.yaml (1 hunks)
bio/mmseqs2/workflows/test/Snakefile (1 hunks)
bio/mmseqs2/workflows/test/db/a.index (1 hunks)
bio/mmseqs2/workflows/test/db/a.lookup (1 hunks)
bio/mmseqs2/workflows/test/db/a.source (1 hunks)
bio/mmseqs2/workflows/test/db/a_h.index (1 hunks)
bio/mmseqs2/workflows/test/db/a_mapping (1 hunks)
bio/mmseqs2/workflows/test/expected/cluster/a_b_all_seqs.fasta (1 hunks)
bio/mmseqs2/workflows/test/expected/cluster/a_b_rep_seq.fasta (1 hunks)
bio/mmseqs2/workflows/test/expected/linclust/a_b_all_seqs.fasta (1 hunks)
bio/mmseqs2/workflows/test/expected/linclust/a_b_rep_seq.fasta (1 hunks)
bio/mmseqs2/workflows/test/expected/rbh/a.tab (1 hunks)
bio/mmseqs2/workflows/test/expected/search/a.tab (1 hunks)
bio/mmseqs2/workflows/test/expected/taxonomy/a_report (1 hunks)
bio/mmseqs2/workflows/wrapper.py (1 hunks)
test_wrappers.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

test_wrappers.py
bio/mmseqs2/db/wrapper.py
bio/mmseqs2/workflows/wrapper.py

**/wrapper.py

⚙️ CodeRabbit configuration file

Do not complain about use of undefined variable called snakemake.

Files:

bio/mmseqs2/db/wrapper.py
bio/mmseqs2/workflows/wrapper.py

🧠 Learnings (8)

📚 Learning: 2025-04-17T09:24:51.738Z

Learnt from: dlaehnemann
PR: snakemake/snakemake-wrappers#0
File: :0-0
Timestamp: 2025-04-17T09:24:51.738Z
Learning: In snakemake-wrappers repository, environment.yaml files should follow these conventions:
1. Use whitespace before the equal sign in version specifications (e.g., "datavzrd =2.53.1")
2. Only specify the exact version for the main software package
3. Don't add version constraints for dependencies unless absolutely necessary
4. See full guidelines at: https://snakemake-wrappers.readthedocs.io/en/stable/contributing.html#environment-yaml-file

Applied to files:

bio/mmseqs2/workflows/environment.yaml
bio/mmseqs2/db/environment.yaml

📚 Learning: 2025-04-17T09:24:51.738Z

Learnt from: dlaehnemann
PR: snakemake/snakemake-wrappers#0
File: :0-0
Timestamp: 2025-04-17T09:24:51.738Z
Learning: In snakemake-wrappers repository, environment.yaml files should follow these conventions:
1. Use whitespace before the equal sign in version specifications (e.g., "datavzrd =2.53.1")
2. Only specify the exact version for the main software package
3. Don't add version constraints for dependencies unless absolutely necessary
4. See guidelines at: https://snakemake-wrappers.readthedocs.io/en/stable/contributing.html#environment-yaml-file

Applied to files:

bio/mmseqs2/workflows/environment.yaml

📚 Learning: 2025-02-11T12:24:22.592Z

Learnt from: dlaehnemann
PR: snakemake/snakemake-wrappers#3648
File: bio/nanosim/simulator/environment.yaml:6-6
Timestamp: 2025-02-11T12:24:22.592Z
Learning: In the nanosim bioconda recipe, dependencies are carefully managed with specific version pins (e.g., scikit-learn ~=0.23.2) to ensure compatibility with pre-trained models. These dependencies don't need to be explicitly added to environment.yaml files when the main package is listed as a dependency, as they are handled through the bioconda recipe system.

Applied to files:

bio/mmseqs2/workflows/environment.yaml
bio/mmseqs2/db/environment.linux-64.pin.txt
bio/mmseqs2/workflows/environment.linux-64.pin.txt

📚 Learning: 2025-06-02T07:56:35.854Z

Learnt from: tdayris
PR: snakemake/snakemake-wrappers#4159
File: bio/pyfaidx/environment.yaml:6-6
Timestamp: 2025-06-02T07:56:35.854Z
Learning: In the Snakemake-wrapper repository, conda dependency version pinning in environment.yaml files uses spaces around the equals sign (e.g., `- pyfaidx =0.8.1.4`) as the established coding standard, even though conda itself doesn't require the spaces.

Applied to files:

bio/mmseqs2/workflows/environment.linux-64.pin.txt

📚 Learning: 2024-11-26T08:31:00.099Z

Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3496
File: bio/mtnucratio/test/Snakefile:2-6
Timestamp: 2024-11-26T08:31:00.099Z
Learning: In test files for Snakemake wrappers, such as `bio/mtnucratio/test/Snakefile`, hard-coded input and output paths are acceptable as examples and do not need to use wildcards to make paths flexible.

Applied to files:

test_wrappers.py

📚 Learning: 2024-08-21T08:30:42.757Z

Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-08-21T08:30:42.757Z
Learning: In `wrapper.py` scripts, do not flag the use of an undefined variable called `snakemake`.

Applied to files:

bio/mmseqs2/db/wrapper.py
bio/mmseqs2/workflows/wrapper.py

📚 Learning: 2024-11-21T10:23:03.427Z

Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.

Applied to files:

bio/mmseqs2/db/wrapper.py
bio/mmseqs2/workflows/wrapper.py

📚 Learning: 2024-10-08T17:41:54.542Z

Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-10-08T17:41:54.542Z
Learning: The `snakemake` variable is inserted via a preamble during execution in `wrapper.py` scripts, so it doesn't need to be explicitly defined.

Applied to files:

bio/mmseqs2/workflows/wrapper.py

🪛 Ruff (0.13.1)

test_wrappers.py

157-157: Dictionary key literal "out/cluster/a_b_cluster.tsv" repeated

Remove repeated key literal "out/cluster/a_b_cluster.tsv"

(F601)

bio/mmseqs2/db/wrapper.py

9-9: Undefined name snakemake

(F821)

10-10: Undefined name snakemake

(F821)

13-13: Undefined name snakemake

(F821)

18-18: Undefined name snakemake

(F821)

25-25: Undefined name snakemake

(F821)

26-26: Undefined name snakemake

(F821)

28-28: Undefined name snakemake

(F821)

bio/mmseqs2/workflows/wrapper.py

10-10: Undefined name snakemake

(F821)

11-11: Undefined name snakemake

(F821)

14-14: Undefined name snakemake

(F821)

16-16: Undefined name snakemake

(F821)

19-19: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: testing

🔇 Additional comments (31)

bio/mmseqs2/db/test/expected/createdb/a.lookup (1)

1-4: LGTM; please confirm column semantics are stable across OS/versions.

The mapping aligns with the duplicated IDs in the input FASTA. Please confirm the 3rd column (“0”) is the expected source/file index from mmseqs2 createdb and is stable across platforms.

bio/mmseqs2/workflows/test/expected/linclust/a_b_rep_seq.fasta (1)

1-4: LGTM.

Representative sequences look fine for the linclust expected output.

bio/mmseqs2/db/environment.linux-64.pin.txt (1)

1-33: LGTM; reproducible DB env pin.

Pins are minimal and include only what’s needed (mmseqs2, gawk, basics). Confirm CI path uses conda create --file to avoid accidental channel mixing.

bio/mmseqs2/workflows/test/expected/taxonomy/a_report (1)

1-1: LGTM.

Report format/fields look consistent for the taxonomy workflow expected output.

bio/mmseqs2/db/test/expected/createdb/a.source (1)

1-1: LGTM; confirm path semantics.

The source mapping “0a.fasta” matches common mmseqs2 conventions. Please confirm tests assume only basenames (not full paths) to keep runs stable across machines.

bio/mmseqs2/workflows/test/expected/rbh/a.tab (1)

1-1: Confirm column order vs mmseqs2 rbh --format-output.

RBH fields can differ by version/flags; please verify the wrapper’s format-output matches this expectation to avoid brittle tests.

bio/mmseqs2/db/test/expected/databases/a.source (1)

1-1: LGTM: expected database source mapping.

Content and tab delimiter look correct.

bio/mmseqs2/workflows/test/db/a.index (1)

1-4: LGTM: index fixture matches createdb expectation.

Looks consistent with the createdb/a.index expectation.

bio/mmseqs2/workflows/test/db/a_mapping (1)

1-5: LGTM: taxonomy mapping fixture.

Values and formatting look fine.

bio/mmseqs2/db/environment.yaml (1)

1-6: LGTM: environment.yaml follows repo conventions.

Channels ordering, nodefaults, and single pinned main dependency all align with guidelines.

If mmseqs2 version changes later, remember to refresh the corresponding pinned lockfile(s) to keep CI deterministic.

bio/mmseqs2/db/test/expected/createdb/a.index (1)

1-4: LGTM: createdb index expectation.

Matches the workflow fixture; tab-separated triplets are consistent.

bio/mmseqs2/workflows/test/db/a_h.index (1)

1-4: LGTM: header index fixture.

Consistent increments; formatting OK.

bio/mmseqs2/workflows/test/db/a.source (1)

1-1: LGTM: source mapping.

Tab-delimited mapping looks correct.

bio/mmseqs2/workflows/test/db/a.lookup (1)

1-4: LGTM: fixture content looks consistent.

Tab-delimited triplets align with the corresponding createdb expectations.

bio/mmseqs2/db/test/expected/createdb/a_h.index (1)

1-4: LGTM: expected index triplets look plausible.

Monotonic offsets with fixed block size are consistent.

bio/mmseqs2/workflows/test/expected/search/a.tab (1)

1-37: LGTM: expected TSV layout is coherent.

Columns/values look consistent with typical MMseqs2 search output.

bio/mmseqs2/workflows/test/expected/linclust/a_b_all_seqs.fasta (1)

1-6: Incorrect — no consecutive FASTA headers found. Repository check under bio/mmseqs2/workflows/test/expected found no consecutive '>' lines; the duplicate header-only records reported are not present.

Likely an incorrect or invalid review comment.

test_wrappers.py (3)

137-151: LGTM: adds end-to-end tests for mmseqs2 workflows.

153-166: Sanity-check side-effect outputs used for comparisons.

"out/search/a.fas" and "out/rbh/a.fas" are not declared outputs; they rely on mmseqs2 side effects. Please confirm these filenames are stable across mmseqs2 versions (to avoid flaky tests).

170-194: LGTM: db tests cover both 'databases' and 'createdb' artifacts.

bio/mmseqs2/db/wrapper.py (1)

13-21: Prefix derivation looks good.

Using commonprefix and rstrip("_") to compute database/output base paths matches the multiext patterns used in tests.

bio/mmseqs2/db/test/Snakefile (2)

1-25: LGTM: sensible outputs and logging for 'databases'.

27-50: LGTM: createdb rule aligns with wrapper contract.

bio/mmseqs2/workflows/wrapper.py (2)

19-30: LGTM: output base handling and format-mode toggles.

31-34: LGTM: command assembly with threads and tmpdir.

bio/mmseqs2/workflows/test/expected/cluster/a_b_rep_seq.fasta (1)

1-4: LGTM: expected FASTA fixture added.

bio/mmseqs2/workflows/test/Snakefile (5)

1-15: LGTM: easy-search rule wiring.

17-32: LGTM: easy-cluster outputs via multiext with stable base.

34-49: LGTM: easy-linclust mirrors cluster interface.

51-81: LGTM: taxonomy target DB multiext matches db wrapper outputs.

83-97: LGTM: RBH rule consistent with search interface.

bio/mmseqs2/workflows/test/expected/cluster/a_b_all_seqs.fasta

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

bio/mmseqs2/workflows/wrapper.py (2)
34-37: Quote paths to avoid word splitting; keep optional target unquoted.

Prevents breakage when paths contain spaces, while not forcing an empty target argument.

Apply:
-    shell(
-        "mmseqs {snakemake.params.module} {snakemake.input.query} {target} {out} {tmpdir} --threads {snakemake.threads} {extra} {log}"
-    )
+    shell(
+        'mmseqs {snakemake.params.module} "{snakemake.input.query}" {target} "{out}" "{tmpdir}" --threads {snakemake.threads} {extra} {log}'
+    )
18-18: Track the TODO for arbitrary output names.

If needed, I can help implement support and adjust tests.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 187892a and 2a4ab88.

📒 Files selected for processing (1)

bio/mmseqs2/workflows/wrapper.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

bio/mmseqs2/workflows/wrapper.py

**/wrapper.py

⚙️ CodeRabbit configuration file

Do not complain about use of undefined variable called snakemake.

Files:

bio/mmseqs2/workflows/wrapper.py

🧠 Learnings (2)

📚 Learning: 2024-11-21T10:23:03.427Z

Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.

Applied to files:

bio/mmseqs2/workflows/wrapper.py

📚 Learning: 2024-08-21T08:30:42.757Z

Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-08-21T08:30:42.757Z
Learning: In `wrapper.py` scripts, do not flag the use of an undefined variable called `snakemake`.

Applied to files:

bio/mmseqs2/workflows/wrapper.py

🪛 Ruff (0.13.1)

bio/mmseqs2/workflows/wrapper.py

10-10: Undefined name snakemake

(F821)

11-11: Undefined name snakemake

(F821)

14-14: Undefined name snakemake

(F821)

16-16: Undefined name snakemake

(F821)

19-19: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: testing
GitHub Check: docs

🔇 Additional comments (1)

bio/mmseqs2/workflows/wrapper.py (1)

1-3: Module metadata looks good.

bio/mmseqs2/workflows/wrapper.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

test_wrappers.py (2)
152-160: Remove duplicate dict key in compare_results_with_expected (ruff F601).

The key "out/cluster/a_b_cluster.tsv" appears twice; the latter silently overrides the former. Drop the duplicate to satisfy linting and avoid confusion.

Apply this diff:
             "out/cluster/a_b_rep_seq.fasta": "expected/cluster/a_b_rep_seq.fasta",
             "out/cluster/a_b_all_seqs.fasta": "expected/cluster/a_b_all_seqs.fasta",
-            "out/cluster/a_b_cluster.tsv": "expected/cluster/a_b_cluster.tsv",
             "out/linclust/a_b_rep_seq.fasta": "expected/linclust/a_b_rep_seq.fasta",
137-196: Consider splitting into two tests for clearer failures.

Having workflows and db assertions in one test makes triage harder when one half fails. Suggest test_mmseqs2_workflows() and test_mmseqs2_db().

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a4ab88 and 11e8805.

📒 Files selected for processing (1)

test_wrappers.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

test_wrappers.py

🪛 Ruff (0.13.1)

test_wrappers.py

157-157: Dictionary key literal "out/cluster/a_b_cluster.tsv" repeated

Remove repeated key literal "out/cluster/a_b_cluster.tsv"

(F601)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: testing
GitHub Check: docs

🔇 Additional comments (1)

test_wrappers.py (1)
152-166: Verify expected fixture files exist
In test_wrappers.py (around lines 152–166 and 181–193), before comparing outputs, iterate over compare_results_with_expected.values() and assert each expected file exists to avoid brittle CI failures. For example:
for expected_path in compare_results_with_expected.values():
    assert os.path.isfile(expected_path), f"Missing fixture: {expected_path}"

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11e8805 and 949d65d.

📒 Files selected for processing (1)

test_wrappers.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

test_wrappers.py

🪛 Ruff (0.14.1)

test_wrappers.py

168-168: Dictionary key literal "out/cluster/a_b_cluster.tsv" repeated

Remove repeated key literal "out/cluster/a_b_cluster.tsv"

(F601)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: docs
GitHub Check: testing

test_wrappers.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Add main workflows

ea4387b

fgvieira marked this pull request as draft April 4, 2025 13:38

fgvieira added 8 commits April 4, 2025 15:53

Code clean-up

1d903d7

Add two db wrappers

261f39f

Fix meta

fe509e5

Merge branch 'master' into mmseqs2

94a7e09

Fix typos

7b121ed

Merge branch 'master' into mmseqs2

f12765b

Infer output format

2f94129

Only set out format for html and sam, and fix typo

187892a

fgvieira marked this pull request as ready for review September 24, 2025 15:43

coderabbitai bot reviewed Sep 24, 2025

View reviewed changes

bio/mmseqs2/workflows/test/expected/cluster/a_b_all_seqs.fasta Show resolved Hide resolved

Fix import

2a4ab88

coderabbitai bot reviewed Sep 24, 2025

View reviewed changes

bio/mmseqs2/workflows/wrapper.py Show resolved Hide resolved

bio/mmseqs2/workflows/wrapper.py Show resolved Hide resolved

bio/mmseqs2/workflows/wrapper.py Show resolved Hide resolved

Fix expected files

11e8805

coderabbitai bot reviewed Sep 24, 2025

View reviewed changes

fgvieira requested a review from johanneskoester September 29, 2025 10:27

fgvieira requested a review from dlaehnemann October 10, 2025 12:53

Merge branch 'master' into mmseqs2

949d65d

coderabbitai bot reviewed Oct 27, 2025

View reviewed changes

test_wrappers.py Show resolved Hide resolved

Remove dup dict entry

1c07285

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add mmseqs2 main workflows #3981

feat: Add mmseqs2 main workflows #3981

Uh oh!

fgvieira commented Apr 4, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 4, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add mmseqs2 main workflows #3981

Are you sure you want to change the base?

feat: Add mmseqs2 main workflows #3981

Uh oh!

Conversation

fgvieira commented Apr 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QC

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fgvieira commented Apr 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 4, 2025 •

edited

Loading