-
Notifications
You must be signed in to change notification settings - Fork 202
feat: Add mmseqs2 main workflows #3981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds MMseqs2 Snakemake wrappers, workflow and DB metadata, Conda environment specs (YAML and linux-64 pin files), test Snakefiles, a test runner, and numerous static test fixtures/expected outputs for DB creation, search, clustering, linclust, taxonomy, and RBH. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Rule as Snakemake Rule
participant Wrapper as MMseqs2 Wrapper
participant MM as mmseqs2 CLI
participant FS as Filesystem
Rule->>Wrapper: invoke(inputs, params(module, extra), threads, log)
Note over Wrapper: normalize inputs/outputs\nresolve common prefixes\nconfigure tmpdir/threads/extra
Wrapper->>MM: mmseqs2 <module> <query> <target?> <output> --threads N <extra> (uses tmpdir)
MM->>FS: read input files
MM-->>FS: write outputs (DBs, tabs, FASTA, reports)
MM-->>Wrapper: exit status
Wrapper-->>Rule: write log, expose outputs
sequenceDiagram
autonumber
participant RuleDB as Snakemake DB Rule
participant DBWrapper as MMseqs2 DB Wrapper
participant MM as mmseqs2 CLI
participant FS as Filesystem
RuleDB->>DBWrapper: invoke(seqs input, params, threads, log)
Note over DBWrapper: special-case modules:\n- databases: append thread flags\n- createdb: disable tmpdir
DBWrapper->>MM: mmseqs2 <module> <in> <out> [--threads N] <extra>
MM->>FS: read seqs
MM-->>FS: emit DB artifacts (.index, .lookup, .source, _h.*)
MM-->>DBWrapper: exit status
DBWrapper-->>RuleDB: log and outputs
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (10)
bio/mmseqs2/db/test/seqs/a.fasta (1)
1-8: Confirm intentional duplicate FASTA IDs.Headers “>1” appear twice. If this is deliberate to exercise lookup behavior, all good; otherwise consider making IDs unique to avoid tool-dependent collisions.
bio/mmseqs2/workflows/environment.linux-64.pin.txt (1)
1-44: Explicit pin looks consistent; verify CI can consume it as-is.mmseqs2 and wrapper-utils are pinned; good for reproducibility. Please ensure CI uses conda/mamba versions that support @explicit with these URLs. Also note OpenSSL/libstdc++ versions differ from the db pin; that’s acceptable but consider aligning if unnecessary drift causes solver differences.
Note: Based on prior repo conventions I’ve learned, ensure environment.yaml (separate file) uses spaces around “=” in pins if updated.
bio/mmseqs2/db/test/expected/databases/a.version (1)
1-165: Verify licensing for bundled SILVA text; consider trimming the fixture.This embeds substantial third‑party documentation text. Please confirm redistribution is permitted under its license. If the test doesn’t require full content, prefer a minimal synthetic placeholder or assert only on sentinel lines to avoid bundling large copyrighted text.
bio/mmseqs2/workflows/test/db/a.index (1)
1-4: Optional: de-duplicate identical fixtures.Consider referencing a single canonical a.index fixture (or symlink) to avoid drift between workflow and db tests.
bio/mmseqs2/db/meta.yaml (1)
3-13: Polish description and I/O/param wording for clarity and consistency.Capitalize the description; prefer “FASTA” over “FAS”; clarify “module” to reflect the MMseqs2 component being run.
description: | - ultra fast and sensitive sequence search and clustering suite + Ultra-fast and sensitive sequence search and clustering suite input: - - input FAS file + - input FASTA file output: - - output: DB files + - output: MMseqs2 DB files params: - - module: workflow to use + - module: MMseqs2 module to run (e.g., createdb) - extra: additional program argumentsbio/mmseqs2/workflows/meta.yaml (1)
3-14: Clarify description and I/O wording; name “FASTA” explicitly; refine param help.Minor copyedits for consistency with repo conventions.
description: | - ultra fast and sensitive sequence search and clustering suite + Ultra-fast and sensitive sequence search and clustering suite input: - - query: input query FAS file(s) - - target: input target FAS file(s) or DB + - query: input query FASTA file(s) + - target: input target FASTA file(s) or MMseqs2 DB output: - - output: FAS, cluster or DB file(s) + - output: FASTA, cluster, or DB file(s) params: - - module: workflow to use + - module: MMseqs2 workflow to run (e.g., cluster, linclust, search) - extra: additional program argumentsbio/mmseqs2/workflows/environment.yaml (1)
5-7: Unpin snakemake-wrapper-utils; mmseqs2 17.b804f confirmed on BiocondaPer repo convention pin only the main tool — keep mmseqs2 pinned, remove the snakemake-wrapper-utils pin.
dependencies: - mmseqs2 =17.b804f - - snakemake-wrapper-utils =0.8.0 + - snakemake-wrapper-utilstest_wrappers.py (1)
153-166: Remove duplicate key in compare_results_with_expected.The key "out/cluster/a_b_cluster.tsv" appears twice; the second overwrites the first.
Apply this diff:
compare_results_with_expected={ "out/search/a.fas": "expected/search/a.fas", "out/cluster/a_b_cluster.tsv": "expected/cluster/a_b_cluster.tsv", "out/cluster/a_b_rep_seq.fasta": "expected/cluster/a_b_rep_seq.fasta", "out/cluster/a_b_all_seqs.fasta": "expected/cluster/a_b_all_seqs.fasta", - "out/cluster/a_b_cluster.tsv": "expected/cluster/a_b_cluster.tsv", "out/linclust/a_b_rep_seq.fasta": "expected/linclust/a_b_rep_seq.fasta", "out/linclust/a_b_all_seqs.fasta": "expected/linclust/a_b_all_seqs.fasta", "out/linclust/a_b_cluster.tsv": "expected/linclust/a_b_cluster.tsv", "out/taxonomy/a_tophit_report": "expected/taxonomy/a_tophit_report", "out/taxonomy/a_tophit_aln": "expected/taxonomy/a_tophit_aln", "out/taxonomy/a_report": "expected/taxonomy/a_report", "out/taxonomy/a_lca.tsv": "expected/taxonomy/a_lca.tsv", "out/rbh/a.fas": "expected/rbh/a.fas", },bio/mmseqs2/db/wrapper.py (1)
23-31: Make module checks robust when params.module includes arguments.
params.modulecan contain arguments (e.g., "databases SILVA"), so equality checks miss "databases". Split before checking; also avoid passing a dummy tmpdir for modules not requiring it.Apply this diff:
with tempfile.TemporaryDirectory() as tmpdir: - # Modules with threads - if snakemake.params.module in ["databases"]: - extra = f"--threads {snakemake.threads} {extra}" - # Modules with no temp folder - if snakemake.params.module in ["createdb"]: - tmpdir = "" - - shell("mmseqs {snakemake.params.module} {input} {out} {tmpdir} {extra} {log}") + module = snakemake.params.module + module_name = module.split()[0] + # Modules with threads + if module_name == "databases": + extra = f"--threads {snakemake.threads} {extra}" + # Modules with no temp folder + tmp_arg = "" if module_name == "createdb" else tmpdir + + shell("mmseqs {module} {input} {out} {tmp_arg} {extra} {log}")bio/mmseqs2/workflows/wrapper.py (1)
14-17: Use the local variable for target prefixing.Minor clarity fix: use the already-resolved
targetwhen getting the common prefix.Apply this diff:
target = snakemake.input.get("target", "") if isinstance(target, list): - target = os.path.commonprefix(snakemake.input.target) + target = os.path.commonprefix(target)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (5)
bio/mmseqs2/workflows/test/expected/cluster/a_b_cluster.tsvis excluded by!**/*.tsvbio/mmseqs2/workflows/test/expected/linclust/a_b_cluster.tsvis excluded by!**/*.tsvbio/mmseqs2/workflows/test/expected/taxonomy/a_lca.tsvis excluded by!**/*.tsvbio/mmseqs2/workflows/test/seqs/nucl.a.fas.gzis excluded by!**/*.gzbio/mmseqs2/workflows/test/seqs/nucl.b.fas.gzis excluded by!**/*.gz
📒 Files selected for processing (30)
bio/mmseqs2/db/environment.linux-64.pin.txt(1 hunks)bio/mmseqs2/db/environment.yaml(1 hunks)bio/mmseqs2/db/meta.yaml(1 hunks)bio/mmseqs2/db/test/Snakefile(1 hunks)bio/mmseqs2/db/test/expected/createdb/a.index(1 hunks)bio/mmseqs2/db/test/expected/createdb/a.lookup(1 hunks)bio/mmseqs2/db/test/expected/createdb/a.source(1 hunks)bio/mmseqs2/db/test/expected/createdb/a_h.index(1 hunks)bio/mmseqs2/db/test/expected/databases/a.source(1 hunks)bio/mmseqs2/db/test/expected/databases/a.version(1 hunks)bio/mmseqs2/db/test/seqs/a.fasta(1 hunks)bio/mmseqs2/db/wrapper.py(1 hunks)bio/mmseqs2/workflows/environment.linux-64.pin.txt(1 hunks)bio/mmseqs2/workflows/environment.yaml(1 hunks)bio/mmseqs2/workflows/meta.yaml(1 hunks)bio/mmseqs2/workflows/test/Snakefile(1 hunks)bio/mmseqs2/workflows/test/db/a.index(1 hunks)bio/mmseqs2/workflows/test/db/a.lookup(1 hunks)bio/mmseqs2/workflows/test/db/a.source(1 hunks)bio/mmseqs2/workflows/test/db/a_h.index(1 hunks)bio/mmseqs2/workflows/test/db/a_mapping(1 hunks)bio/mmseqs2/workflows/test/expected/cluster/a_b_all_seqs.fasta(1 hunks)bio/mmseqs2/workflows/test/expected/cluster/a_b_rep_seq.fasta(1 hunks)bio/mmseqs2/workflows/test/expected/linclust/a_b_all_seqs.fasta(1 hunks)bio/mmseqs2/workflows/test/expected/linclust/a_b_rep_seq.fasta(1 hunks)bio/mmseqs2/workflows/test/expected/rbh/a.tab(1 hunks)bio/mmseqs2/workflows/test/expected/search/a.tab(1 hunks)bio/mmseqs2/workflows/test/expected/taxonomy/a_report(1 hunks)bio/mmseqs2/workflows/wrapper.py(1 hunks)test_wrappers.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
test_wrappers.pybio/mmseqs2/db/wrapper.pybio/mmseqs2/workflows/wrapper.py
**/wrapper.py
⚙️ CodeRabbit configuration file
Do not complain about use of undefined variable called
snakemake.
Files:
bio/mmseqs2/db/wrapper.pybio/mmseqs2/workflows/wrapper.py
🧠 Learnings (8)
📚 Learning: 2025-04-17T09:24:51.738Z
Learnt from: dlaehnemann
PR: snakemake/snakemake-wrappers#0
File: :0-0
Timestamp: 2025-04-17T09:24:51.738Z
Learning: In snakemake-wrappers repository, environment.yaml files should follow these conventions:
1. Use whitespace before the equal sign in version specifications (e.g., "datavzrd =2.53.1")
2. Only specify the exact version for the main software package
3. Don't add version constraints for dependencies unless absolutely necessary
4. See full guidelines at: https://snakemake-wrappers.readthedocs.io/en/stable/contributing.html#environment-yaml-file
Applied to files:
bio/mmseqs2/workflows/environment.yamlbio/mmseqs2/db/environment.yaml
📚 Learning: 2025-04-17T09:24:51.738Z
Learnt from: dlaehnemann
PR: snakemake/snakemake-wrappers#0
File: :0-0
Timestamp: 2025-04-17T09:24:51.738Z
Learning: In snakemake-wrappers repository, environment.yaml files should follow these conventions:
1. Use whitespace before the equal sign in version specifications (e.g., "datavzrd =2.53.1")
2. Only specify the exact version for the main software package
3. Don't add version constraints for dependencies unless absolutely necessary
4. See guidelines at: https://snakemake-wrappers.readthedocs.io/en/stable/contributing.html#environment-yaml-file
Applied to files:
bio/mmseqs2/workflows/environment.yaml
📚 Learning: 2025-02-11T12:24:22.592Z
Learnt from: dlaehnemann
PR: snakemake/snakemake-wrappers#3648
File: bio/nanosim/simulator/environment.yaml:6-6
Timestamp: 2025-02-11T12:24:22.592Z
Learning: In the nanosim bioconda recipe, dependencies are carefully managed with specific version pins (e.g., scikit-learn ~=0.23.2) to ensure compatibility with pre-trained models. These dependencies don't need to be explicitly added to environment.yaml files when the main package is listed as a dependency, as they are handled through the bioconda recipe system.
Applied to files:
bio/mmseqs2/workflows/environment.yamlbio/mmseqs2/db/environment.linux-64.pin.txtbio/mmseqs2/workflows/environment.linux-64.pin.txt
📚 Learning: 2025-06-02T07:56:35.854Z
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#4159
File: bio/pyfaidx/environment.yaml:6-6
Timestamp: 2025-06-02T07:56:35.854Z
Learning: In the Snakemake-wrapper repository, conda dependency version pinning in environment.yaml files uses spaces around the equals sign (e.g., `- pyfaidx =0.8.1.4`) as the established coding standard, even though conda itself doesn't require the spaces.
Applied to files:
bio/mmseqs2/workflows/environment.linux-64.pin.txt
📚 Learning: 2024-11-26T08:31:00.099Z
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3496
File: bio/mtnucratio/test/Snakefile:2-6
Timestamp: 2024-11-26T08:31:00.099Z
Learning: In test files for Snakemake wrappers, such as `bio/mtnucratio/test/Snakefile`, hard-coded input and output paths are acceptable as examples and do not need to use wildcards to make paths flexible.
Applied to files:
test_wrappers.py
📚 Learning: 2024-08-21T08:30:42.757Z
Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-08-21T08:30:42.757Z
Learning: In `wrapper.py` scripts, do not flag the use of an undefined variable called `snakemake`.
Applied to files:
bio/mmseqs2/db/wrapper.pybio/mmseqs2/workflows/wrapper.py
📚 Learning: 2024-11-21T10:23:03.427Z
Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.
Applied to files:
bio/mmseqs2/db/wrapper.pybio/mmseqs2/workflows/wrapper.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-10-08T17:41:54.542Z
Learning: The `snakemake` variable is inserted via a preamble during execution in `wrapper.py` scripts, so it doesn't need to be explicitly defined.
Applied to files:
bio/mmseqs2/workflows/wrapper.py
🪛 Ruff (0.13.1)
test_wrappers.py
157-157: Dictionary key literal "out/cluster/a_b_cluster.tsv" repeated
Remove repeated key literal "out/cluster/a_b_cluster.tsv"
(F601)
bio/mmseqs2/db/wrapper.py
9-9: Undefined name snakemake
(F821)
10-10: Undefined name snakemake
(F821)
13-13: Undefined name snakemake
(F821)
18-18: Undefined name snakemake
(F821)
25-25: Undefined name snakemake
(F821)
26-26: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
bio/mmseqs2/workflows/wrapper.py
10-10: Undefined name snakemake
(F821)
11-11: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
19-19: Undefined name snakemake
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: testing
🔇 Additional comments (31)
bio/mmseqs2/db/test/expected/createdb/a.lookup (1)
1-4: LGTM; please confirm column semantics are stable across OS/versions.The mapping aligns with the duplicated IDs in the input FASTA. Please confirm the 3rd column (“0”) is the expected source/file index from mmseqs2 createdb and is stable across platforms.
bio/mmseqs2/workflows/test/expected/linclust/a_b_rep_seq.fasta (1)
1-4: LGTM.Representative sequences look fine for the linclust expected output.
bio/mmseqs2/db/environment.linux-64.pin.txt (1)
1-33: LGTM; reproducible DB env pin.Pins are minimal and include only what’s needed (mmseqs2, gawk, basics). Confirm CI path uses conda create --file to avoid accidental channel mixing.
bio/mmseqs2/workflows/test/expected/taxonomy/a_report (1)
1-1: LGTM.Report format/fields look consistent for the taxonomy workflow expected output.
bio/mmseqs2/db/test/expected/createdb/a.source (1)
1-1: LGTM; confirm path semantics.The source mapping “0a.fasta” matches common mmseqs2 conventions. Please confirm tests assume only basenames (not full paths) to keep runs stable across machines.
bio/mmseqs2/workflows/test/expected/rbh/a.tab (1)
1-1: Confirm column order vs mmseqs2 rbh --format-output.RBH fields can differ by version/flags; please verify the wrapper’s format-output matches this expectation to avoid brittle tests.
bio/mmseqs2/db/test/expected/databases/a.source (1)
1-1: LGTM: expected database source mapping.Content and tab delimiter look correct.
bio/mmseqs2/workflows/test/db/a.index (1)
1-4: LGTM: index fixture matches createdb expectation.Looks consistent with the createdb/a.index expectation.
bio/mmseqs2/workflows/test/db/a_mapping (1)
1-5: LGTM: taxonomy mapping fixture.Values and formatting look fine.
bio/mmseqs2/db/environment.yaml (1)
1-6: LGTM: environment.yaml follows repo conventions.Channels ordering, nodefaults, and single pinned main dependency all align with guidelines.
If mmseqs2 version changes later, remember to refresh the corresponding pinned lockfile(s) to keep CI deterministic.
bio/mmseqs2/db/test/expected/createdb/a.index (1)
1-4: LGTM: createdb index expectation.Matches the workflow fixture; tab-separated triplets are consistent.
bio/mmseqs2/workflows/test/db/a_h.index (1)
1-4: LGTM: header index fixture.Consistent increments; formatting OK.
bio/mmseqs2/workflows/test/db/a.source (1)
1-1: LGTM: source mapping.Tab-delimited mapping looks correct.
bio/mmseqs2/workflows/test/db/a.lookup (1)
1-4: LGTM: fixture content looks consistent.Tab-delimited triplets align with the corresponding createdb expectations.
bio/mmseqs2/db/test/expected/createdb/a_h.index (1)
1-4: LGTM: expected index triplets look plausible.Monotonic offsets with fixed block size are consistent.
bio/mmseqs2/workflows/test/expected/search/a.tab (1)
1-37: LGTM: expected TSV layout is coherent.Columns/values look consistent with typical MMseqs2 search output.
bio/mmseqs2/workflows/test/expected/linclust/a_b_all_seqs.fasta (1)
1-6: Incorrect — no consecutive FASTA headers found. Repository check under bio/mmseqs2/workflows/test/expected found no consecutive '>' lines; the duplicate header-only records reported are not present.Likely an incorrect or invalid review comment.
test_wrappers.py (3)
137-151: LGTM: adds end-to-end tests for mmseqs2 workflows.
153-166: Sanity-check side-effect outputs used for comparisons."out/search/a.fas" and "out/rbh/a.fas" are not declared outputs; they rely on mmseqs2 side effects. Please confirm these filenames are stable across mmseqs2 versions (to avoid flaky tests).
170-194: LGTM: db tests cover both 'databases' and 'createdb' artifacts.bio/mmseqs2/db/wrapper.py (1)
13-21: Prefix derivation looks good.Using commonprefix and rstrip("_") to compute database/output base paths matches the multiext patterns used in tests.
bio/mmseqs2/db/test/Snakefile (2)
1-25: LGTM: sensible outputs and logging for 'databases'.
27-50: LGTM: createdb rule aligns with wrapper contract.bio/mmseqs2/workflows/wrapper.py (2)
19-30: LGTM: output base handling and format-mode toggles.
31-34: LGTM: command assembly with threads and tmpdir.bio/mmseqs2/workflows/test/expected/cluster/a_b_rep_seq.fasta (1)
1-4: LGTM: expected FASTA fixture added.bio/mmseqs2/workflows/test/Snakefile (5)
1-15: LGTM: easy-search rule wiring.
17-32: LGTM: easy-cluster outputs via multiext with stable base.
34-49: LGTM: easy-linclust mirrors cluster interface.
51-81: LGTM: taxonomy target DB multiext matches db wrapper outputs.
83-97: LGTM: RBH rule consistent with search interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
bio/mmseqs2/workflows/wrapper.py (2)
34-37: Quote paths to avoid word splitting; keep optional target unquoted.Prevents breakage when paths contain spaces, while not forcing an empty target argument.
Apply:
- shell( - "mmseqs {snakemake.params.module} {snakemake.input.query} {target} {out} {tmpdir} --threads {snakemake.threads} {extra} {log}" - ) + shell( + 'mmseqs {snakemake.params.module} "{snakemake.input.query}" {target} "{out}" "{tmpdir}" --threads {snakemake.threads} {extra} {log}' + )
18-18: Track the TODO for arbitrary output names.If needed, I can help implement support and adjust tests.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
bio/mmseqs2/workflows/wrapper.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
bio/mmseqs2/workflows/wrapper.py
**/wrapper.py
⚙️ CodeRabbit configuration file
Do not complain about use of undefined variable called
snakemake.
Files:
bio/mmseqs2/workflows/wrapper.py
🧠 Learnings (2)
📚 Learning: 2024-11-21T10:23:03.427Z
Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.
Applied to files:
bio/mmseqs2/workflows/wrapper.py
📚 Learning: 2024-08-21T08:30:42.757Z
Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-08-21T08:30:42.757Z
Learning: In `wrapper.py` scripts, do not flag the use of an undefined variable called `snakemake`.
Applied to files:
bio/mmseqs2/workflows/wrapper.py
🪛 Ruff (0.13.1)
bio/mmseqs2/workflows/wrapper.py
10-10: Undefined name snakemake
(F821)
11-11: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
19-19: Undefined name snakemake
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: testing
- GitHub Check: docs
🔇 Additional comments (1)
bio/mmseqs2/workflows/wrapper.py (1)
1-3: Module metadata looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
test_wrappers.py (2)
152-160: Remove duplicate dict key in compare_results_with_expected (ruff F601).The key "out/cluster/a_b_cluster.tsv" appears twice; the latter silently overrides the former. Drop the duplicate to satisfy linting and avoid confusion.
Apply this diff:
"out/cluster/a_b_rep_seq.fasta": "expected/cluster/a_b_rep_seq.fasta", "out/cluster/a_b_all_seqs.fasta": "expected/cluster/a_b_all_seqs.fasta", - "out/cluster/a_b_cluster.tsv": "expected/cluster/a_b_cluster.tsv", "out/linclust/a_b_rep_seq.fasta": "expected/linclust/a_b_rep_seq.fasta",
137-196: Consider splitting into two tests for clearer failures.Having workflows and db assertions in one test makes triage harder when one half fails. Suggest test_mmseqs2_workflows() and test_mmseqs2_db().
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
test_wrappers.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
test_wrappers.py
🪛 Ruff (0.13.1)
test_wrappers.py
157-157: Dictionary key literal "out/cluster/a_b_cluster.tsv" repeated
Remove repeated key literal "out/cluster/a_b_cluster.tsv"
(F601)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: testing
- GitHub Check: docs
🔇 Additional comments (1)
test_wrappers.py (1)
152-166: Verify expected fixture files exist
In test_wrappers.py (around lines 152–166 and 181–193), before comparing outputs, iterate overcompare_results_with_expected.values()and assert each expected file exists to avoid brittle CI failures. For example:for expected_path in compare_results_with_expected.values(): assert os.path.isfile(expected_path), f"Missing fixture: {expected_path}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
test_wrappers.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
test_wrappers.py
🪛 Ruff (0.14.1)
test_wrappers.py
168-168: Dictionary key literal "out/cluster/a_b_cluster.tsv" repeated
Remove repeated key literal "out/cluster/a_b_cluster.tsv"
(F601)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: docs
- GitHub Check: testing
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
QC
snakemake-wrappers.While the contributions guidelines are more extensive, please particularly ensure that:
test.pywas updated to call any added or updated example rules in aSnakefileinput:andoutput:file paths in the rules can be chosen arbitrarilyinput:oroutput:)tempfile.gettempdir()points tometa.yamlcontains a link to the documentation of the respective tool or command underurl:Summary by CodeRabbit
New Features
Tests
Documentation
Chores