-
Notifications
You must be signed in to change notification settings - Fork 205
feat: Add wrappers in bio/reference for Ensembl Genomes #4882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds Ensembl Genomes sequence and annotation Snakemake wrappers, manifests, Conda environment specs/pins, and tests. Wrappers build HTTPS-first (with FTP fallback) download URLs, validate parameters, handle gzip and per‑chromosome retrieval, and log attempts/errors. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
bio/reference/ensembl-genomes-annotation/test/Snakefile (1)
10-10: Consider using unique log files for each rule.Both
get_annotationandget_annotation_gzrules write to the same log filelogs/get_annotation.log. If both rules are executed in the same workflow run, the second rule's log may overwrite the first. Based on learnings, simplified logging configurations are acceptable in test Snakefiles, so this is a minor observation rather than a blocking issue.🔎 Suggested fix for unique log paths
rule get_annotation: output: "refs/ensembl-genomes-annotation.gtf", params: species="schizosaccharomyces_pombe", division="fungi", assembly="ASM294v2", release="62", log: - "logs/get_annotation.log", + "logs/get_annotation_gtf.log", cache: "omit-software" # save space and time with between workflow caching (see docs) wrapper: "master/bio/reference/ensembl-genomes-annotation" rule get_annotation_gz: output: "refs/ensembl-genomes-annotation.gtf.gz", params: species="schizosaccharomyces_pombe", division="fungi", assembly="ASM294v2", release="62", log: - "logs/get_annotation.log", + "logs/get_annotation_gz.log", cache: "omit-software" # save space and time with between workflow caching (see docs) wrapper: "master/bio/reference/ensembl-genomes-annotation"Also applies to: 24-25
bio/reference/ensembl-genomes-sequence/wrapper.py (2)
35-35: Minor: Unused initial assignment.
suffixes = ""is never used since all code paths in lines 38-59 either reassign it to a list or raise an exception. Consider initializing asNoneor removing the initialization entirely for clarity.🔎 Suggested fix
-suffixes = "" +suffixes = None datatype = snakemake.params.get("datatype", "")
67-68: URL construction may produce double slashes.The default URL ends with
/, and line 68 prepends another/beforerelease-{release}, resulting inhttps://ftp.ebi.ac.uk/ensemblgenomes/pub//release-62/.... While most servers handle double slashes gracefully, stripping the trailing slash would produce cleaner URLs.🔎 Suggested fix
url = snakemake.params.get("url", "https://ftp.ebi.ac.uk/ensemblgenomes/pub/") -url_prefix = f"{url}/release-{release}/{division}/fasta/{species}/{datatype}/{species.capitalize()}.{assembly}" +url_prefix = f"{url.rstrip('/')}/release-{release}/{division}/fasta/{species}/{datatype}/{species.capitalize()}.{assembly}"
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
bio/reference/ensembl-genomes-sequence/test/expected/chr3_and_chrMt.fasta.gzis excluded by!**/*.gz
📒 Files selected for processing (12)
bio/reference/ensembl-genomes-annotation/environment.linux-64.pin.txtbio/reference/ensembl-genomes-annotation/environment.yamlbio/reference/ensembl-genomes-annotation/meta.yamlbio/reference/ensembl-genomes-annotation/test/Snakefilebio/reference/ensembl-genomes-annotation/wrapper.pybio/reference/ensembl-genomes-sequence/environment.linux-64.pin.txtbio/reference/ensembl-genomes-sequence/environment.yamlbio/reference/ensembl-genomes-sequence/meta.yamlbio/reference/ensembl-genomes-sequence/test/Snakefilebio/reference/ensembl-genomes-sequence/test/expected/chr3_and_chrMt.fastabio/reference/ensembl-genomes-sequence/wrapper.pytest_wrappers.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
bio/reference/ensembl-genomes-annotation/wrapper.pytest_wrappers.pybio/reference/ensembl-genomes-sequence/wrapper.py
**/wrapper.py
⚙️ CodeRabbit configuration file
Do not complain about use of undefined variable called
snakemake.
Files:
bio/reference/ensembl-genomes-annotation/wrapper.pybio/reference/ensembl-genomes-sequence/wrapper.py
🧠 Learnings (24)
📓 Common learnings
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3432
File: bio/reference/ensembl-sequence/wrapper.py:58-58
Timestamp: 2025-01-30T14:17:43.431Z
Learning: In the ensembl-sequence wrapper (bio/reference/ensembl-sequence/wrapper.py), when downloading multiple chromosomes, if any chromosome download fails, the entire output becomes invalid. Therefore, tracking individual chromosome failures is unnecessary - the wrapper should fail fast on the first error.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:1-10
Timestamp: 2024-11-26T10:49:04.406Z
Learning: Using generic output filenames like "genome.fasta" is acceptable in the `snakemake-wrappers` project.
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3432
File: bio/reference/ensembl-sequence/wrapper.py:64-70
Timestamp: 2025-01-30T14:15:32.176Z
Learning: In Snakemake wrappers that download multiple chromosomes into a single output file, if any chromosome download fails, the entire operation should fail immediately (using `break`) rather than continuing with remaining chromosomes, as partial output would be invalid.
📚 Learning: 2024-11-21T10:23:03.427Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.pybio/reference/ensembl-genomes-sequence/test/Snakefilebio/reference/ensembl-genomes-sequence/wrapper.py
📚 Learning: 2024-08-21T08:30:42.757Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-08-21T08:30:42.757Z
Learning: In `wrapper.py` scripts, do not flag the use of an undefined variable called `snakemake`.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.pybio/reference/ensembl-genomes-sequence/wrapper.py
📚 Learning: 2024-11-26T08:30:23.818Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/wrapper.py:27-28
Timestamp: 2024-11-26T08:30:23.818Z
Learning: In Snakemake wrappers (e.g., `wrapper.py` files), it's unnecessary to verify the availability of tools like `mtnucratio` within the code, because Snakemake with Conda ensures that the required tools are installed and available.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.pybio/reference/ensembl-genomes-sequence/wrapper.py
📚 Learning: 2024-11-15T18:32:07.612Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:11-12
Timestamp: 2024-11-15T18:32:07.612Z
Learning: In the `snakemake-wrappers` repository, when developing wrappers, it's acceptable to use "master" as the wrapper version in the `wrapper` directive. This ensures that the documentation always reflects the latest stable version of the wrappers.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.pybio/reference/ensembl-genomes-sequence/wrapper.py
📚 Learning: 2025-01-30T14:17:43.431Z
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3432
File: bio/reference/ensembl-sequence/wrapper.py:58-58
Timestamp: 2025-01-30T14:17:43.431Z
Learning: In the ensembl-sequence wrapper (bio/reference/ensembl-sequence/wrapper.py), when downloading multiple chromosomes, if any chromosome download fails, the entire output becomes invalid. Therefore, tracking individual chromosome failures is unnecessary - the wrapper should fail fast on the first error.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.pybio/reference/ensembl-genomes-sequence/test/Snakefiletest_wrappers.pybio/reference/ensembl-genomes-sequence/wrapper.py
📚 Learning: 2024-11-15T13:48:33.759Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/preprocess-variants/wrapper.py:0-0
Timestamp: 2024-11-15T13:48:33.759Z
Learning: In Snakemake wrappers, security considerations like input sanitization are unnecessary, as the wrappers are under full control of the user.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.pybio/reference/ensembl-genomes-sequence/wrapper.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-10-08T17:41:54.542Z
Learning: The `snakemake` variable is inserted via a preamble during execution in `wrapper.py` scripts, so it doesn't need to be explicitly defined.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.pybio/reference/ensembl-genomes-sequence/wrapper.py
📚 Learning: 2025-06-04T06:32:20.090Z
Learnt from: rohan-ibn-tariq
Repo: snakemake/snakemake-wrappers PR: 4160
File: bio/trf/wrapper.py:18-26
Timestamp: 2025-06-04T06:32:20.090Z
Learning: For Snakemake wrappers, it's preferable to keep parameter dictionaries and constants directly visible in wrapper.py files rather than importing from config modules, to maintain minimal, self-documenting code that doc viewers can understand at a glance.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2025-04-17T09:24:51.738Z
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 0
File: :0-0
Timestamp: 2025-04-17T09:24:51.738Z
Learning: In snakemake-wrappers repository, environment.yaml files should follow these conventions:
1. Use whitespace before the equal sign in version specifications (e.g., "datavzrd =2.53.1")
2. Only specify the exact version for the main software package
3. Don't add version constraints for dependencies unless absolutely necessary
4. See full guidelines at: https://snakemake-wrappers.readthedocs.io/en/stable/contributing.html#environment-yaml-file
Applied to files:
bio/reference/ensembl-genomes-annotation/environment.yamlbio/reference/ensembl-genomes-sequence/environment.yaml
📚 Learning: 2025-04-17T09:24:51.738Z
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 0
File: :0-0
Timestamp: 2025-04-17T09:24:51.738Z
Learning: In snakemake-wrappers repository, environment.yaml files should follow these conventions:
1. Use whitespace before the equal sign in version specifications (e.g., "datavzrd =2.53.1")
2. Only specify the exact version for the main software package
3. Don't add version constraints for dependencies unless absolutely necessary
4. See guidelines at: https://snakemake-wrappers.readthedocs.io/en/stable/contributing.html#environment-yaml-file
Applied to files:
bio/reference/ensembl-genomes-annotation/environment.yamlbio/reference/ensembl-genomes-sequence/environment.yaml
📚 Learning: 2025-02-11T12:24:22.592Z
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3648
File: bio/nanosim/simulator/environment.yaml:6-6
Timestamp: 2025-02-11T12:24:22.592Z
Learning: In the nanosim bioconda recipe, dependencies are carefully managed with specific version pins (e.g., scikit-learn ~=0.23.2) to ensure compatibility with pre-trained models. These dependencies don't need to be explicitly added to environment.yaml files when the main package is listed as a dependency, as they are handled through the bioconda recipe system.
Applied to files:
bio/reference/ensembl-genomes-annotation/environment.yamlbio/reference/ensembl-genomes-annotation/environment.linux-64.pin.txtbio/reference/ensembl-genomes-sequence/environment.linux-64.pin.txtbio/reference/ensembl-genomes-sequence/environment.yaml
📚 Learning: 2024-11-26T15:01:13.202Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/wrapper.py:1-23
Timestamp: 2024-11-26T15:01:13.202Z
Learning: The NGS-bits SampleAncestry wrapper in `bio/ngsbits/sampleancestry/` includes a test Snakefile, sample VCF files, and tests available in the `test/` directory.
Applied to files:
bio/reference/ensembl-genomes-annotation/test/Snakefilebio/reference/ensembl-genomes-sequence/test/Snakefiletest_wrappers.py
📚 Learning: 2024-11-26T09:16:39.570Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3499
File: bio/ngscheckmate/makesnvpattern/test/Snakefile:14-22
Timestamp: 2024-11-26T09:16:39.570Z
Learning: In the `snakemake-wrappers` repository, when writing test `Snakefile`s (e.g., `bio/ngscheckmate/makesnvpattern/test/Snakefile`), hardcoded input/output paths are acceptable because these are examples and the IO can be chosen freely.
Applied to files:
bio/reference/ensembl-genomes-annotation/test/Snakefilebio/reference/ensembl-genomes-sequence/test/Snakefile
📚 Learning: 2024-11-26T10:49:54.765Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:58-71
Timestamp: 2024-11-26T10:49:54.765Z
Learning: In test Snakefiles within the snakemake-wrappers repository, it is acceptable to use simplified paths and logging configurations that may differ from real-life pipelines.
Applied to files:
bio/reference/ensembl-genomes-annotation/test/Snakefilebio/reference/ensembl-genomes-sequence/test/Snakefiletest_wrappers.py
📚 Learning: 2025-06-15T07:54:36.475Z
Learnt from: tedil
Repo: snakemake/snakemake-wrappers PR: 4274
File: bio/mehari/download-clinvar-db/wrapper.py:29-33
Timestamp: 2025-06-15T07:54:36.475Z
Learning: The conda-packaged tar 1.34 has default settings that include `-f -`, so explicit specification of `-f -` is not required when piping input to tar in conda environments, unlike standard system tar which requires it explicitly.
Applied to files:
bio/reference/ensembl-genomes-annotation/environment.linux-64.pin.txtbio/reference/ensembl-genomes-sequence/environment.linux-64.pin.txt
📚 Learning: 2024-11-26T10:49:04.406Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:1-10
Timestamp: 2024-11-26T10:49:04.406Z
Learning: Using generic output filenames like "genome.fasta" is acceptable in the `snakemake-wrappers` project.
Applied to files:
bio/reference/ensembl-genomes-sequence/test/Snakefile
📚 Learning: 2024-11-26T08:31:00.099Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/test/Snakefile:2-6
Timestamp: 2024-11-26T08:31:00.099Z
Learning: In test files for Snakemake wrappers, such as `bio/mtnucratio/test/Snakefile`, hard-coded input and output paths are acceptable as examples and do not need to use wildcards to make paths flexible.
Applied to files:
bio/reference/ensembl-genomes-sequence/test/Snakefiletest_wrappers.py
📚 Learning: 2024-11-26T09:16:24.981Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3499
File: bio/ngscheckmate/makesnvpattern/test/Snakefile:1-13
Timestamp: 2024-11-26T09:16:24.981Z
Learning: In test `Snakefile`s (e.g., `test/Snakefile`), it's acceptable to use fixed input and output file names instead of wildcards.
Applied to files:
bio/reference/ensembl-genomes-sequence/test/Snakefile
📚 Learning: 2024-11-21T10:50:09.006Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/call-variants/test/Snakefile:29-47
Timestamp: 2024-11-21T10:50:09.006Z
Learning: In example Snakefiles, using hardcoded sample names is acceptable.
Applied to files:
bio/reference/ensembl-genomes-sequence/test/Snakefile
📚 Learning: 2025-02-11T12:24:22.592Z
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3648
File: bio/nanosim/simulator/environment.yaml:6-6
Timestamp: 2025-02-11T12:24:22.592Z
Learning: For bioconda packages like nanosim, dependencies like scikit-learn are managed through the bioconda recipe and don't need to be explicitly added to the environment.yaml file when the main package is listed as a dependency.
Applied to files:
bio/reference/ensembl-genomes-sequence/environment.yaml
📚 Learning: 2025-06-02T07:56:35.854Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 4159
File: bio/pyfaidx/environment.yaml:6-6
Timestamp: 2025-06-02T07:56:35.854Z
Learning: In the Snakemake-wrapper repository, conda dependency version pinning in environment.yaml files uses spaces around the equals sign (e.g., `- pyfaidx =0.8.1.4`) as the established coding standard, even though conda itself doesn't require the spaces.
Applied to files:
bio/reference/ensembl-genomes-sequence/environment.yaml
📚 Learning: 2024-11-26T14:59:03.678Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/wrapper.py:18-23
Timestamp: 2024-11-26T14:59:03.678Z
Learning: In Snakemake wrapper scripts, Snakemake validates input and output paths, so explicit shell quoting is not necessary.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.py
📚 Learning: 2025-01-30T14:15:32.176Z
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3432
File: bio/reference/ensembl-sequence/wrapper.py:64-70
Timestamp: 2025-01-30T14:15:32.176Z
Learning: In Snakemake wrappers that download multiple chromosomes into a single output file, if any chromosome download fails, the entire operation should fail immediately (using `break`) rather than continuing with remaining chromosomes, as partial output would be invalid.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.py
🪛 Ruff (0.14.10)
bio/reference/ensembl-genomes-annotation/wrapper.py
16-16: Undefined name snakemake
(F821)
17-17: Undefined name snakemake
(F821)
18-18: Undefined name snakemake
(F821)
19-19: Undefined name snakemake
(F821)
21-21: Undefined name snakemake
(F821)
27-29: Avoid specifying long messages outside the exception class
(TRY003)
35-35: Undefined name snakemake
(F821)
43-45: Avoid specifying long messages outside the exception class
(TRY003)
47-47: Undefined name snakemake
(F821)
56-56: Undefined name subprocess
(F821)
62-62: Undefined name subprocess
(F821)
63-63: Undefined name snakemake
(F821)
64-64: Undefined name snakemake
(F821)
bio/reference/ensembl-genomes-sequence/wrapper.py
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
15-15: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
20-22: Avoid specifying long messages outside the exception class
(TRY003)
28-28: Undefined name snakemake
(F821)
30-30: Undefined name snakemake
(F821)
36-36: Undefined name snakemake
(F821)
37-37: Undefined name snakemake
(F821)
57-59: Avoid specifying long messages outside the exception class
(TRY003)
63-65: Avoid specifying long messages outside the exception class
(TRY003)
67-67: Undefined name snakemake
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Summary
🔇 Additional comments (7)
bio/reference/ensembl-genomes-sequence/meta.yaml (1)
14-14: Verify URL protocol consistency with wrapper implementation.The meta.yaml specifies the default URL as
ftp://ftp.ebi.ac.uk/ensemblgenomes/pub/, but verify this matches the actual default in the corresponding wrapper.py file. The annotation wrapper useshttps://as the default (line 47 in annotation/wrapper.py), so confirm whether the sequence wrapper should also use HTTPS.test_wrappers.py (2)
5716-5783: LGTM! Well-structured test coverage for Ensembl Genomes sequence wrapper.The new test functions follow the established patterns in the repository and provide comprehensive coverage for:
- Basic genome download
- Gzipped output
- Single chromosome retrieval
- Multiple chromosome retrieval (both uncompressed and gzipped)
The test naming is consistent with the existing
test_ensembl_sequence_*tests, making the test suite cohesive.
5800-5825: LGTM! Annotation wrapper tests are consistent with sequence wrapper tests.Both annotation test functions properly exercise the compressed and uncompressed output paths.
bio/reference/ensembl-genomes-sequence/wrapper.py (4)
1-11: LGTM! Clean header and appropriate imports.The imports cover subprocess, shell utilities, and logging as needed for download operations.
13-22: LGTM! Parameter extraction and validation looks good.The division validation against the allowed list with a clear error message is appropriate.
70-103: LGTM! Download logic with HTTPS/FTP fallback is well-implemented.The fail-fast behavior for chromosome downloads (line 91) aligns with the established pattern from the ensembl-sequence wrapper. The HTTPS-to-FTP fallback provides good resilience against temporary server issues.
105-118: LGTM! Comprehensive error handling with helpful messaging.The error messages guide users to check URL availability and verify their parameter combinations, which is valuable for debugging download issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
bio/reference/ensembl-genomes-annotation/wrapper.py (3)
9-9: Consider removing commented imports.The commented imports on lines 9 and 13 appear to be unused remnants. Removing them would make the code cleaner.
🔎 Proposed cleanup
-# from itertools import product from snakemake.shell import shell from snakemake.logging import logger -# from os import path from pathlib import PathAlso applies to: 13-13
22-22: Simplify the gzip detection logic.The expression
(out_fmt.pop() and True)works but is unnecessarily complex. Theand Trueis used to convert the popped string to a boolean, but this can be clearer.🔎 Clearer alternative
-out_gz = (out_fmt.pop() and True) if out_fmt[-1] == ".gz" else False +if out_fmt[-1] == ".gz": + out_fmt.pop() + out_gz = True +else: + out_gz = False
31-33: Future enhancements documented.The TODO comments identify known limitations regarding subdivided directories and non-standard naming conventions. These are useful notes for future work.
Would you like me to open an issue to track these enhancements, or would you prefer to address them in a future PR?
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
bio/reference/ensembl-genomes-annotation/meta.yamlbio/reference/ensembl-genomes-annotation/wrapper.pybio/reference/ensembl-genomes-sequence/meta.yaml
✅ Files skipped from review due to trivial changes (1)
- bio/reference/ensembl-genomes-sequence/meta.yaml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
bio/reference/ensembl-genomes-annotation/wrapper.py
**/wrapper.py
⚙️ CodeRabbit configuration file
Do not complain about use of undefined variable called
snakemake.
Files:
bio/reference/ensembl-genomes-annotation/wrapper.py
🧠 Learnings (12)
📚 Learning: 2024-11-21T10:23:03.427Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-08-21T08:30:42.757Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-08-21T08:30:42.757Z
Learning: In `wrapper.py` scripts, do not flag the use of an undefined variable called `snakemake`.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-26T08:30:23.818Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/wrapper.py:27-28
Timestamp: 2024-11-26T08:30:23.818Z
Learning: In Snakemake wrappers (e.g., `wrapper.py` files), it's unnecessary to verify the availability of tools like `mtnucratio` within the code, because Snakemake with Conda ensures that the required tools are installed and available.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2025-01-30T14:17:43.431Z
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3432
File: bio/reference/ensembl-sequence/wrapper.py:58-58
Timestamp: 2025-01-30T14:17:43.431Z
Learning: In the ensembl-sequence wrapper (bio/reference/ensembl-sequence/wrapper.py), when downloading multiple chromosomes, if any chromosome download fails, the entire output becomes invalid. Therefore, tracking individual chromosome failures is unnecessary - the wrapper should fail fast on the first error.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T18:32:07.612Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:11-12
Timestamp: 2024-11-15T18:32:07.612Z
Learning: In the `snakemake-wrappers` repository, when developing wrappers, it's acceptable to use "master" as the wrapper version in the `wrapper` directive. This ensures that the documentation always reflects the latest stable version of the wrappers.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T13:48:33.759Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/preprocess-variants/wrapper.py:0-0
Timestamp: 2024-11-15T13:48:33.759Z
Learning: In Snakemake wrappers, security considerations like input sanitization are unnecessary, as the wrappers are under full control of the user.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-26T09:08:06.041Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3499
File: bio/ngscheckmate/makesnvpattern/wrapper.py:18-24
Timestamp: 2024-11-26T09:08:06.041Z
Learning: In Snakemake wrappers, input file validation is managed by Snakemake, so manual validation in the wrapper code is unnecessary.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2025-06-04T06:32:20.090Z
Learnt from: rohan-ibn-tariq
Repo: snakemake/snakemake-wrappers PR: 4160
File: bio/trf/wrapper.py:18-26
Timestamp: 2025-06-04T06:32:20.090Z
Learning: For Snakemake wrappers, it's preferable to keep parameter dictionaries and constants directly visible in wrapper.py files rather than importing from config modules, to maintain minimal, self-documenting code that doc viewers can understand at a glance.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-12-06T14:25:43.922Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3498
File: bio/ngscheckmate/ncm/wrapper.py:68-68
Timestamp: 2024-12-06T14:25:43.922Z
Learning: In the `bio/ngscheckmate/ncm/wrapper.py` file for the NGSCheckMate wrapper, do not suggest adding file existence checks for the list file input, as the files are already required by the wrapper.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-08-21T08:27:20.017Z
Learnt from: MagdalenaZZ
Repo: snakemake/snakemake-wrappers PR: 3071
File: bio/paraphase/wrapper.py:0-0
Timestamp: 2024-08-21T08:27:20.017Z
Learning: In Snakemake wrappers, a try/except block is unnecessary because Snakemake will handle exceptions if the wrapper fails.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T18:30:28.583Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/preprocess-variants/wrapper.py:1-3
Timestamp: 2024-11-15T18:30:28.583Z
Learning: In Snakemake wrappers, there is no need to use `WorkflowError`; plain Python exceptions are sufficient.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-26T08:29:27.546Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/wrapper.py:21-25
Timestamp: 2024-11-26T08:29:27.546Z
Learning: In `bio/mtnucratio/wrapper.py`, when creating symlinks within a temporary directory, it's acceptable to rely on Python's built-in error handling for I/O permissions and availability. Explicit try-except blocks around symlink creation may not be necessary.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
🪛 Ruff (0.14.10)
bio/reference/ensembl-genomes-annotation/wrapper.py
16-16: Undefined name snakemake
(F821)
17-17: Undefined name snakemake
(F821)
18-18: Undefined name snakemake
(F821)
19-19: Undefined name snakemake
(F821)
21-21: Undefined name snakemake
(F821)
27-29: Avoid specifying long messages outside the exception class
(TRY003)
35-35: Undefined name snakemake
(F821)
43-45: Avoid specifying long messages outside the exception class
(TRY003)
47-47: Undefined name snakemake
(F821)
63-63: Undefined name snakemake
(F821)
64-64: Undefined name snakemake
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Summary
🔇 Additional comments (5)
bio/reference/ensembl-genomes-annotation/wrapper.py (5)
16-19: LGTM!Parameter extraction is clean and applies appropriate transformations (lowercasing, type conversion).
25-29: LGTM!Division validation is correct and provides a clear error message listing valid options.
37-49: LGTM!The suffix determination correctly handles GTF and GFF3 formats with appropriate error handling. The URL construction follows Ensembl Genomes conventions with a sensible HTTPS-to-FTP fallback mechanism.
51-70: LGTM!The download logic is well-structured with appropriate fallback from HTTPS to FTP. The error handling is correct (previous issue with exception naming was fixed), and the final error message clearly guides users to validate their parameters.
21-23: Handle edge case: output without proper file extensions.The current logic assumes the output path has at least one proper extension (e.g.,
.gtfor.gtf.gz). If the output path has only.gz(no format extension) or no extensions at all, lines 22-23 will raise anIndexError.While unlikely given the Snakemake workflow context and meta.yaml specification, adding a check would make the wrapper more robust and provide a clearer error message.
🔎 Suggested improvement with validation
out_fmt = Path(snakemake.output[0]).suffixes +if not out_fmt: + raise ValueError("Output file must have a format extension (.gtf or .gff3, optionally with .gz)") out_gz = (out_fmt.pop() and True) if out_fmt[-1] == ".gz" else False +if not out_fmt: + raise ValueError("Output file must have a format extension (.gtf or .gff3) before the optional .gz") out_fmt = out_fmt.pop().lstrip(".")⛔ Skipped due to learnings
Learnt from: johanneskoester Repo: snakemake/snakemake-wrappers PR: 3478 File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12 Timestamp: 2024-11-21T10:23:03.427Z Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.Learnt from: johanneskoester Repo: snakemake/snakemake-wrappers PR: 3498 File: bio/ngscheckmate/ncm/wrapper.py:68-68 Timestamp: 2024-12-06T14:25:43.922Z Learning: In the `bio/ngscheckmate/ncm/wrapper.py` file for the NGSCheckMate wrapper, do not suggest adding file existence checks for the list file input, as the files are already required by the wrapper.Learnt from: tdayris Repo: snakemake/snakemake-wrappers PR: 3499 File: bio/ngscheckmate/makesnvpattern/wrapper.py:18-24 Timestamp: 2024-11-26T09:08:06.041Z Learning: In Snakemake wrappers, input file validation is managed by Snakemake, so manual validation in the wrapper code is unnecessary.Learnt from: johanneskoester Repo: snakemake/snakemake-wrappers PR: 3478 File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:7-10 Timestamp: 2024-11-15T18:31:15.447Z Learning: In the Snakemake wrappers repository, avoid suggesting refactoring that involves using `tempfile.gettempdir()` or changing output paths to temporary directories.Learnt from: tdayris Repo: snakemake/snakemake-wrappers PR: 3502 File: bio/ngsbits/sampleancestry/wrapper.py:18-23 Timestamp: 2024-11-26T14:59:03.678Z Learning: In Snakemake wrapper scripts, Snakemake validates input and output paths, so explicit shell quoting is not necessary.Learnt from: tdayris Repo: snakemake/snakemake-wrappers PR: 3497 File: bio/sexdeterrmine/wrapper.py:23-26 Timestamp: 2024-11-26T08:35:42.140Z Learning: In the `bio/sexdeterrmine/wrapper.py` file (Python), we rely on Samtools to handle input validation for the depth file provided by the user, so additional file existence checks are not necessary.Learnt from: tdayris Repo: snakemake/snakemake-wrappers PR: 3497 File: bio/sexdeterrmine/wrapper.py:0-0 Timestamp: 2024-11-26T08:33:32.457Z Learning: In the `bio/sexdeterrmine/wrapper.py` file for the Sex.DetERRmine wrapper, missing input/output errors are raised by the shell command and captured in the logs. Therefore, additional error handling for file operations is not necessary.Learnt from: tdayris Repo: snakemake/snakemake-wrappers PR: 3501 File: meta/bio/varscan2_snpeff/test/Snakefile:1-10 Timestamp: 2024-11-26T10:49:04.406Z Learning: Using generic output filenames like "genome.fasta" is acceptable in the `snakemake-wrappers` project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
bio/reference/ensembl-genomes-sequence/wrapper.py (2)
33-41: Type inconsistency:suffixesinitialized as string but used as list.
suffixesis initialized as an empty string on line 33, but all code paths assign it a list. While this works due to Python's dynamic typing, it reduces code clarity.🔎 Suggested fix
-suffixes = "" +suffixes = [] datatype = snakemake.params.get("datatype", "") chromosome = snakemake.params.get("chromosome", "") if datatype == "dna":
65-66: Double slash in URL when using the default base URL.The default URL
https://ftp.ebi.ac.uk/ensemblgenomes/pub/ends with a slash, and line 66 adds another slash withf"{url}/release-{release}/...", resulting inpub//release-.... While many servers tolerate this, it's better to normalize.🔎 Suggested fix
-url = snakemake.params.get("url", "https://ftp.ebi.ac.uk/ensemblgenomes/pub/") -url_prefix = f"{url}/release-{release}/{division}/fasta/{species}/{datatype}/{species.capitalize()}.{assembly}" +url = snakemake.params.get("url", "https://ftp.ebi.ac.uk/ensemblgenomes/pub/").rstrip("/") +url_prefix = f"{url}/release-{release}/{division}/fasta/{species}/{datatype}/{species.capitalize()}.{assembly}"bio/reference/ensembl-genomes-annotation/wrapper.py (2)
18-20: Convoluted output format detection logic.The in-place list modification with
pop()is difficult to follow. Consider a clearer approach.🔎 Suggested simplification
-out_fmt = Path(snakemake.output[0]).suffixes -out_gz = (out_fmt.pop() and True) if out_fmt[-1] == ".gz" else False -out_fmt = out_fmt.pop().lstrip(".") +output_path = Path(snakemake.output[0]) +out_gz = output_path.suffix == ".gz" +out_fmt = output_path.suffixes[-2].lstrip(".") if out_gz else output_path.suffix.lstrip(".")
44-45: Double slash in URL when using the default base URL.Same issue as the sequence wrapper: the default URL ends with
/and line 45 adds another/.🔎 Suggested fix
-url = snakemake.params.get("url", "https://ftp.ebi.ac.uk/ensemblgenomes/pub/") -url = f"{url}/release-{release}/{division}/{out_fmt}/{species}/{species.capitalize()}.{assembly}.{release}.{suffix}" +url = snakemake.params.get("url", "https://ftp.ebi.ac.uk/ensemblgenomes/pub/").rstrip("/") +url = f"{url}/release-{release}/{division}/{out_fmt}/{species}/{species.capitalize()}.{assembly}.{release}.{suffix}"
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
bio/reference/ensembl-genomes-annotation/wrapper.pybio/reference/ensembl-genomes-sequence/wrapper.pytest_wrappers.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
test_wrappers.pybio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
**/wrapper.py
⚙️ CodeRabbit configuration file
Do not complain about use of undefined variable called
snakemake.
Files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
🧠 Learnings (20)
📓 Common learnings
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/wrapper.py:1-23
Timestamp: 2024-11-26T15:01:13.202Z
Learning: The NGS-bits SampleAncestry wrapper in `bio/ngsbits/sampleancestry/` includes a test Snakefile, sample VCF files, and tests available in the `test/` directory.
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3432
File: bio/reference/ensembl-sequence/wrapper.py:58-58
Timestamp: 2025-01-30T14:17:43.431Z
Learning: In the ensembl-sequence wrapper (bio/reference/ensembl-sequence/wrapper.py), when downloading multiple chromosomes, if any chromosome download fails, the entire output becomes invalid. Therefore, tracking individual chromosome failures is unnecessary - the wrapper should fail fast on the first error.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/test/Snakefile:10-11
Timestamp: 2024-11-26T15:00:09.980Z
Learning: For the NGS-bits SampleAncestry wrapper, users should decide which assembly and parameters they want. Supported genome builds are provided through the tool's `url` in `meta.yaml`.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/wrapper.py:27-28
Timestamp: 2024-11-26T08:30:23.818Z
Learning: In Snakemake wrappers (e.g., `wrapper.py` files), it's unnecessary to verify the availability of tools like `mtnucratio` within the code, because Snakemake with Conda ensures that the required tools are installed and available.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:1-10
Timestamp: 2024-11-26T10:49:04.406Z
Learning: Using generic output filenames like "genome.fasta" is acceptable in the `snakemake-wrappers` project.
📚 Learning: 2025-01-30T14:17:43.431Z
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3432
File: bio/reference/ensembl-sequence/wrapper.py:58-58
Timestamp: 2025-01-30T14:17:43.431Z
Learning: In the ensembl-sequence wrapper (bio/reference/ensembl-sequence/wrapper.py), when downloading multiple chromosomes, if any chromosome download fails, the entire output becomes invalid. Therefore, tracking individual chromosome failures is unnecessary - the wrapper should fail fast on the first error.
Applied to files:
test_wrappers.pybio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-26T10:49:54.765Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:58-71
Timestamp: 2024-11-26T10:49:54.765Z
Learning: In test Snakefiles within the snakemake-wrappers repository, it is acceptable to use simplified paths and logging configurations that may differ from real-life pipelines.
Applied to files:
test_wrappers.py
📚 Learning: 2024-11-26T08:31:00.099Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/test/Snakefile:2-6
Timestamp: 2024-11-26T08:31:00.099Z
Learning: In test files for Snakemake wrappers, such as `bio/mtnucratio/test/Snakefile`, hard-coded input and output paths are acceptable as examples and do not need to use wildcards to make paths flexible.
Applied to files:
test_wrappers.py
📚 Learning: 2024-11-26T15:01:13.202Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/wrapper.py:1-23
Timestamp: 2024-11-26T15:01:13.202Z
Learning: The NGS-bits SampleAncestry wrapper in `bio/ngsbits/sampleancestry/` includes a test Snakefile, sample VCF files, and tests available in the `test/` directory.
Applied to files:
test_wrappers.py
📚 Learning: 2024-11-21T10:23:03.427Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-08-21T08:30:42.757Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-08-21T08:30:42.757Z
Learning: In `wrapper.py` scripts, do not flag the use of an undefined variable called `snakemake`.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-26T14:59:03.678Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/wrapper.py:18-23
Timestamp: 2024-11-26T14:59:03.678Z
Learning: In Snakemake wrapper scripts, Snakemake validates input and output paths, so explicit shell quoting is not necessary.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-26T08:30:23.818Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/wrapper.py:27-28
Timestamp: 2024-11-26T08:30:23.818Z
Learning: In Snakemake wrappers (e.g., `wrapper.py` files), it's unnecessary to verify the availability of tools like `mtnucratio` within the code, because Snakemake with Conda ensures that the required tools are installed and available.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T18:32:07.612Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:11-12
Timestamp: 2024-11-15T18:32:07.612Z
Learning: In the `snakemake-wrappers` repository, when developing wrappers, it's acceptable to use "master" as the wrapper version in the `wrapper` directive. This ensures that the documentation always reflects the latest stable version of the wrappers.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T13:48:33.759Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/preprocess-variants/wrapper.py:0-0
Timestamp: 2024-11-15T13:48:33.759Z
Learning: In Snakemake wrappers, security considerations like input sanitization are unnecessary, as the wrappers are under full control of the user.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-26T10:49:04.406Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:1-10
Timestamp: 2024-11-26T10:49:04.406Z
Learning: Using generic output filenames like "genome.fasta" is acceptable in the `snakemake-wrappers` project.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-10-08T17:41:54.542Z
Learning: The `snakemake` variable is inserted via a preamble during execution in `wrapper.py` scripts, so it doesn't need to be explicitly defined.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2025-06-04T06:32:20.090Z
Learnt from: rohan-ibn-tariq
Repo: snakemake/snakemake-wrappers PR: 4160
File: bio/trf/wrapper.py:18-26
Timestamp: 2025-06-04T06:32:20.090Z
Learning: For Snakemake wrappers, it's preferable to keep parameter dictionaries and constants directly visible in wrapper.py files rather than importing from config modules, to maintain minimal, self-documenting code that doc viewers can understand at a glance.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-08-21T08:27:20.017Z
Learnt from: MagdalenaZZ
Repo: snakemake/snakemake-wrappers PR: 3071
File: bio/paraphase/wrapper.py:0-0
Timestamp: 2024-08-21T08:27:20.017Z
Learning: In Snakemake wrappers, a try/except block is unnecessary because Snakemake will handle exceptions if the wrapper fails.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T18:30:28.583Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/preprocess-variants/wrapper.py:1-3
Timestamp: 2024-11-15T18:30:28.583Z
Learning: In Snakemake wrappers, there is no need to use `WorkflowError`; plain Python exceptions are sufficient.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-26T10:49:04.406Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:1-10
Timestamp: 2024-11-26T10:49:04.406Z
Learning: In the `snakemake-wrappers` project, it's acceptable to use "master" in the wrapper path instead of pinning to a specific version.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T18:36:04.660Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/call-variants/wrapper.py:15-23
Timestamp: 2024-11-15T18:36:04.660Z
Learning: In the Snakemake wrappers repository, using `shell=True` and redirecting within shell commands is acceptable.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T18:31:15.447Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:7-10
Timestamp: 2024-11-15T18:31:15.447Z
Learning: In the Snakemake wrappers repository, avoid suggesting refactoring that involves using `tempfile.gettempdir()` or changing output paths to temporary directories.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-08-14T15:21:37.230Z
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3115
File: CHANGELOG.md:5-5
Timestamp: 2024-08-14T15:21:37.230Z
Learning: Do not review release-please commits in the Snakemake wrappers repository as they are auto-formatted.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
🪛 Ruff (0.14.10)
bio/reference/ensembl-genomes-sequence/wrapper.py
11-11: Undefined name snakemake
(F821)
12-12: Undefined name snakemake
(F821)
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
18-20: Avoid specifying long messages outside the exception class
(TRY003)
26-26: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
34-34: Undefined name snakemake
(F821)
35-35: Undefined name snakemake
(F821)
55-57: Avoid specifying long messages outside the exception class
(TRY003)
61-63: Avoid specifying long messages outside the exception class
(TRY003)
65-65: Undefined name snakemake
(F821)
bio/reference/ensembl-genomes-annotation/wrapper.py
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
15-15: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
18-18: Undefined name snakemake
(F821)
24-26: Avoid specifying long messages outside the exception class
(TRY003)
32-32: Undefined name snakemake
(F821)
40-42: Avoid specifying long messages outside the exception class
(TRY003)
44-44: Undefined name snakemake
(F821)
60-60: Undefined name snakemake
(F821)
61-61: Undefined name snakemake
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Summary
🔇 Additional comments (6)
bio/reference/ensembl-genomes-sequence/wrapper.py (2)
68-101: LGTM: Download loop with HTTPS/FTP fallback.The download logic correctly implements fail-fast behavior for chromosome downloads (per learnings from the ensembl-sequence wrapper), and the HTTPS-to-FTP fallback mechanism is sound.
103-116: LGTM: Error handling with helpful guidance.The error handling provides clear, actionable error messages guiding users to verify URL availability and parameter combinations.
bio/reference/ensembl-genomes-annotation/wrapper.py (1)
48-67: LGTM: Download logic with HTTPS/FTP fallback.The download mechanism correctly uses
curl --failto detect HTTP errors and falls back to FTP. Thesubprocess.CalledProcessErrorissue from the previous review has been addressed.test_wrappers.py (3)
5716-5783: LGTM: Comprehensive test coverage for ensembl-genomes-sequence wrapper.The test functions cover the key variants: genome download (raw/gzipped), single chromosome, and multiple chromosomes (raw/gzipped). The test structure follows established conventions in this file.
5800-5825: LGTM: Test coverage for ensembl-genomes-annotation wrapper.Tests cover both uncompressed GTF and gzipped GTF.gz outputs, matching the wrapper's capabilities.
131-131: LGTM: Minor formatting adjustment.The diff generation refactoring is a simple cleanup with no functional change.
* Initialize list as list * Remove redundant trailing slash from URL prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
bio/reference/ensembl-genomes-annotation/wrapper.pybio/reference/ensembl-genomes-sequence/wrapper.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
**/wrapper.py
⚙️ CodeRabbit configuration file
Do not complain about use of undefined variable called
snakemake.
Files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
🧠 Learnings (16)
📓 Common learnings
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3115
File: CHANGELOG.md:5-5
Timestamp: 2024-08-14T15:21:37.230Z
Learning: Do not review release-please commits in the Snakemake wrappers repository as they are auto-formatted.
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3115
File: CHANGELOG.md:5-5
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Do not review release-please commits in the Snakemake wrappers repository as they are auto-formatted.
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3432
File: bio/reference/ensembl-sequence/wrapper.py:58-58
Timestamp: 2025-01-30T14:17:43.431Z
Learning: In the ensembl-sequence wrapper (bio/reference/ensembl-sequence/wrapper.py), when downloading multiple chromosomes, if any chromosome download fails, the entire output becomes invalid. Therefore, tracking individual chromosome failures is unnecessary - the wrapper should fail fast on the first error.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:1-10
Timestamp: 2024-11-26T10:49:04.406Z
Learning: Using generic output filenames like "genome.fasta" is acceptable in the `snakemake-wrappers` project.
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3432
File: bio/reference/ensembl-sequence/wrapper.py:64-70
Timestamp: 2025-01-30T14:15:32.176Z
Learning: In Snakemake wrappers that download multiple chromosomes into a single output file, if any chromosome download fails, the entire operation should fail immediately (using `break`) rather than continuing with remaining chromosomes, as partial output would be invalid.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/test/Snakefile:10-11
Timestamp: 2024-11-26T15:00:09.980Z
Learning: For the NGS-bits SampleAncestry wrapper, users should decide which assembly and parameters they want. Supported genome builds are provided through the tool's `url` in `meta.yaml`.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/wrapper.py:1-23
Timestamp: 2024-11-26T15:01:13.202Z
Learning: The NGS-bits SampleAncestry wrapper in `bio/ngsbits/sampleancestry/` includes a test Snakefile, sample VCF files, and tests available in the `test/` directory.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/wrapper.py:27-28
Timestamp: 2024-11-26T08:30:23.818Z
Learning: In Snakemake wrappers (e.g., `wrapper.py` files), it's unnecessary to verify the availability of tools like `mtnucratio` within the code, because Snakemake with Conda ensures that the required tools are installed and available.
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/preprocess-variants/wrapper.py:0-0
Timestamp: 2024-11-15T13:48:33.759Z
Learning: In Snakemake wrappers, security considerations like input sanitization are unnecessary, as the wrappers are under full control of the user.
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:11-12
Timestamp: 2024-11-15T18:32:07.612Z
Learning: In the `snakemake-wrappers` repository, when developing wrappers, it's acceptable to use "master" as the wrapper version in the `wrapper` directive. This ensures that the documentation always reflects the latest stable version of the wrappers.
📚 Learning: 2024-11-21T10:23:03.427Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-08-21T08:30:42.757Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-08-21T08:30:42.757Z
Learning: In `wrapper.py` scripts, do not flag the use of an undefined variable called `snakemake`.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2025-01-30T14:17:43.431Z
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3432
File: bio/reference/ensembl-sequence/wrapper.py:58-58
Timestamp: 2025-01-30T14:17:43.431Z
Learning: In the ensembl-sequence wrapper (bio/reference/ensembl-sequence/wrapper.py), when downloading multiple chromosomes, if any chromosome download fails, the entire output becomes invalid. Therefore, tracking individual chromosome failures is unnecessary - the wrapper should fail fast on the first error.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-10-08T17:41:54.542Z
Learning: The `snakemake` variable is inserted via a preamble during execution in `wrapper.py` scripts, so it doesn't need to be explicitly defined.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.py
📚 Learning: 2024-11-26T14:59:03.678Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/wrapper.py:18-23
Timestamp: 2024-11-26T14:59:03.678Z
Learning: In Snakemake wrapper scripts, Snakemake validates input and output paths, so explicit shell quoting is not necessary.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T13:48:33.759Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/preprocess-variants/wrapper.py:0-0
Timestamp: 2024-11-15T13:48:33.759Z
Learning: In Snakemake wrappers, security considerations like input sanitization are unnecessary, as the wrappers are under full control of the user.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-26T08:30:23.818Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/wrapper.py:27-28
Timestamp: 2024-11-26T08:30:23.818Z
Learning: In Snakemake wrappers (e.g., `wrapper.py` files), it's unnecessary to verify the availability of tools like `mtnucratio` within the code, because Snakemake with Conda ensures that the required tools are installed and available.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T18:32:07.612Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:11-12
Timestamp: 2024-11-15T18:32:07.612Z
Learning: In the `snakemake-wrappers` repository, when developing wrappers, it's acceptable to use "master" as the wrapper version in the `wrapper` directive. This ensures that the documentation always reflects the latest stable version of the wrappers.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2025-06-04T06:32:20.090Z
Learnt from: rohan-ibn-tariq
Repo: snakemake/snakemake-wrappers PR: 4160
File: bio/trf/wrapper.py:18-26
Timestamp: 2025-06-04T06:32:20.090Z
Learning: For Snakemake wrappers, it's preferable to keep parameter dictionaries and constants directly visible in wrapper.py files rather than importing from config modules, to maintain minimal, self-documenting code that doc viewers can understand at a glance.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-08-21T08:27:20.017Z
Learnt from: MagdalenaZZ
Repo: snakemake/snakemake-wrappers PR: 3071
File: bio/paraphase/wrapper.py:0-0
Timestamp: 2024-08-21T08:27:20.017Z
Learning: In Snakemake wrappers, a try/except block is unnecessary because Snakemake will handle exceptions if the wrapper fails.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T18:30:28.583Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/preprocess-variants/wrapper.py:1-3
Timestamp: 2024-11-15T18:30:28.583Z
Learning: In Snakemake wrappers, there is no need to use `WorkflowError`; plain Python exceptions are sufficient.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-26T10:49:04.406Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:1-10
Timestamp: 2024-11-26T10:49:04.406Z
Learning: In the `snakemake-wrappers` project, it's acceptable to use "master" in the wrapper path instead of pinning to a specific version.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T18:36:04.660Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/call-variants/wrapper.py:15-23
Timestamp: 2024-11-15T18:36:04.660Z
Learning: In the Snakemake wrappers repository, using `shell=True` and redirecting within shell commands is acceptable.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T18:31:15.447Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:7-10
Timestamp: 2024-11-15T18:31:15.447Z
Learning: In the Snakemake wrappers repository, avoid suggesting refactoring that involves using `tempfile.gettempdir()` or changing output paths to temporary directories.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-08-14T15:21:37.230Z
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3115
File: CHANGELOG.md:5-5
Timestamp: 2024-08-14T15:21:37.230Z
Learning: Do not review release-please commits in the Snakemake wrappers repository as they are auto-formatted.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
🪛 Ruff (0.14.10)
bio/reference/ensembl-genomes-sequence/wrapper.py
11-11: Undefined name snakemake
(F821)
12-12: Undefined name snakemake
(F821)
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
18-20: Avoid specifying long messages outside the exception class
(TRY003)
26-26: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
34-34: Undefined name snakemake
(F821)
35-35: Undefined name snakemake
(F821)
55-57: Avoid specifying long messages outside the exception class
(TRY003)
61-63: Avoid specifying long messages outside the exception class
(TRY003)
65-65: Undefined name snakemake
(F821)
bio/reference/ensembl-genomes-annotation/wrapper.py
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
15-15: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
18-18: Undefined name snakemake
(F821)
24-26: Avoid specifying long messages outside the exception class
(TRY003)
32-32: Undefined name snakemake
(F821)
40-42: Avoid specifying long messages outside the exception class
(TRY003)
44-44: Undefined name snakemake
(F821)
60-60: Undefined name snakemake
(F821)
61-61: Undefined name snakemake
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Summary
🔇 Additional comments (10)
bio/reference/ensembl-genomes-annotation/wrapper.py (4)
22-26: LGTM!Division validation logic correctly ensures only supported Ensembl Genomes divisions are accepted.
34-42: LGTM!Format validation correctly limits to supported annotation formats and constructs appropriate file suffixes.
44-46: LGTM!URL construction correctly follows Ensembl Genomes directory structure with proper fallback to FTP.
48-67: LGTM!Download logic with HTTPS-to-FTP fallback is correctly implemented. The subprocess import issue identified in previous reviews has been resolved.
bio/reference/ensembl-genomes-sequence/wrapper.py (6)
16-20: LGTM!Division validation is consistent with the annotation wrapper and correctly enforces allowed divisions.
28-31: LGTM!Decompression logic correctly handles both compressed and uncompressed output formats.
36-41: LGTM!Datatype handling correctly constructs suffix patterns for different sequence types. The use of
dna.toplevel.fa.gzis appropriate for Ensembl Genomes.
59-63: LGTM!Constraint validation correctly ensures chromosomes can only be specified for genomic DNA sequences.
68-101: LGTM!The download loop correctly handles both single-file and multi-chromosome scenarios. The fail-fast behavior on chromosome download failure aligns with the requirement that partial output is invalid.
103-116: LGTM!Error handling provides clear, actionable feedback for both chromosome-specific and general download failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
bio/reference/ensembl-genomes-annotation/wrapper.py (2)
30-47: Output format parsing logic is correct.The suffix parsing correctly handles various cases (
.gtf,.gtf.gz,.gff3,.gff3.gz) and properly validates against supported formats. The logic at line 33 works correctly:len(out_fmt) > 1ensures there's both a format extension and.gzbefore checking if the last suffix is.gz.💡 Optional: Add clarifying comment
For improved readability, consider adding a brief comment:
out_fmt = Path(snakemake.output[0]).suffixes if not out_fmt: raise ValueError("Output file must have a format filename extension") +# Check if gzipped: need at least 2 suffixes (e.g., .gtf.gz) and last must be .gz out_gz = len(out_fmt) > 1 and out_fmt[-1] == ".gz"
26-28: TODO comments indicate known limitations.The wrapper currently doesn't handle:
- Bacteria and fungi/ascomycota subdivided folders
- Non-standard chromosome file naming conventions
These limitations are documented, which is good.
Would you like me to open issues to track these enhancements, or help implement support for these cases?
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
bio/reference/ensembl-genomes-annotation/wrapper.pybio/reference/ensembl-genomes-sequence/wrapper.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
**/wrapper.py
⚙️ CodeRabbit configuration file
Do not complain about use of undefined variable called
snakemake.
Files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
🧠 Learnings (17)
📓 Common learnings
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3432
File: bio/reference/ensembl-sequence/wrapper.py:58-58
Timestamp: 2025-01-30T14:17:43.431Z
Learning: In the ensembl-sequence wrapper (bio/reference/ensembl-sequence/wrapper.py), when downloading multiple chromosomes, if any chromosome download fails, the entire output becomes invalid. Therefore, tracking individual chromosome failures is unnecessary - the wrapper should fail fast on the first error.
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:11-12
Timestamp: 2024-11-15T18:32:07.612Z
Learning: In the `snakemake-wrappers` repository, when developing wrappers, it's acceptable to use "master" as the wrapper version in the `wrapper` directive. This ensures that the documentation always reflects the latest stable version of the wrappers.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:1-10
Timestamp: 2024-11-26T10:49:04.406Z
Learning: Using generic output filenames like "genome.fasta" is acceptable in the `snakemake-wrappers` project.
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3115
File: CHANGELOG.md:5-5
Timestamp: 2024-08-14T15:21:37.230Z
Learning: Do not review release-please commits in the Snakemake wrappers repository as they are auto-formatted.
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3115
File: CHANGELOG.md:5-5
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Do not review release-please commits in the Snakemake wrappers repository as they are auto-formatted.
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3432
File: bio/reference/ensembl-sequence/wrapper.py:64-70
Timestamp: 2025-01-30T14:15:32.176Z
Learning: In Snakemake wrappers that download multiple chromosomes into a single output file, if any chromosome download fails, the entire operation should fail immediately (using `break`) rather than continuing with remaining chromosomes, as partial output would be invalid.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/test/Snakefile:10-11
Timestamp: 2024-11-26T15:00:09.980Z
Learning: For the NGS-bits SampleAncestry wrapper, users should decide which assembly and parameters they want. Supported genome builds are provided through the tool's `url` in `meta.yaml`.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/wrapper.py:1-23
Timestamp: 2024-11-26T15:01:13.202Z
Learning: The NGS-bits SampleAncestry wrapper in `bio/ngsbits/sampleancestry/` includes a test Snakefile, sample VCF files, and tests available in the `test/` directory.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/wrapper.py:27-28
Timestamp: 2024-11-26T08:30:23.818Z
Learning: In Snakemake wrappers (e.g., `wrapper.py` files), it's unnecessary to verify the availability of tools like `mtnucratio` within the code, because Snakemake with Conda ensures that the required tools are installed and available.
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/preprocess-variants/wrapper.py:0-0
Timestamp: 2024-11-15T13:48:33.759Z
Learning: In Snakemake wrappers, security considerations like input sanitization are unnecessary, as the wrappers are under full control of the user.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3499
File: bio/ngscheckmate/makesnvpattern/test/Snakefile:14-22
Timestamp: 2024-11-26T09:16:39.570Z
Learning: In the `snakemake-wrappers` repository, when writing test `Snakefile`s (e.g., `bio/ngscheckmate/makesnvpattern/test/Snakefile`), hardcoded input/output paths are acceptable because these are examples and the IO can be chosen freely.
📚 Learning: 2024-11-21T10:23:03.427Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-08-21T08:30:42.757Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-08-21T08:30:42.757Z
Learning: In `wrapper.py` scripts, do not flag the use of an undefined variable called `snakemake`.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T13:48:33.759Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/preprocess-variants/wrapper.py:0-0
Timestamp: 2024-11-15T13:48:33.759Z
Learning: In Snakemake wrappers, security considerations like input sanitization are unnecessary, as the wrappers are under full control of the user.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-26T14:59:03.678Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/wrapper.py:18-23
Timestamp: 2024-11-26T14:59:03.678Z
Learning: In Snakemake wrapper scripts, Snakemake validates input and output paths, so explicit shell quoting is not necessary.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-10-08T17:41:54.542Z
Learning: The `snakemake` variable is inserted via a preamble during execution in `wrapper.py` scripts, so it doesn't need to be explicitly defined.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.py
📚 Learning: 2024-11-26T08:30:23.818Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/wrapper.py:27-28
Timestamp: 2024-11-26T08:30:23.818Z
Learning: In Snakemake wrappers (e.g., `wrapper.py` files), it's unnecessary to verify the availability of tools like `mtnucratio` within the code, because Snakemake with Conda ensures that the required tools are installed and available.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2025-01-30T14:17:43.431Z
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3432
File: bio/reference/ensembl-sequence/wrapper.py:58-58
Timestamp: 2025-01-30T14:17:43.431Z
Learning: In the ensembl-sequence wrapper (bio/reference/ensembl-sequence/wrapper.py), when downloading multiple chromosomes, if any chromosome download fails, the entire output becomes invalid. Therefore, tracking individual chromosome failures is unnecessary - the wrapper should fail fast on the first error.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.py
📚 Learning: 2024-11-15T18:32:07.612Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:11-12
Timestamp: 2024-11-15T18:32:07.612Z
Learning: In the `snakemake-wrappers` repository, when developing wrappers, it's acceptable to use "master" as the wrapper version in the `wrapper` directive. This ensures that the documentation always reflects the latest stable version of the wrappers.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.pybio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2025-01-30T14:15:32.176Z
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3432
File: bio/reference/ensembl-sequence/wrapper.py:64-70
Timestamp: 2025-01-30T14:15:32.176Z
Learning: In Snakemake wrappers that download multiple chromosomes into a single output file, if any chromosome download fails, the entire operation should fail immediately (using `break`) rather than continuing with remaining chromosomes, as partial output would be invalid.
Applied to files:
bio/reference/ensembl-genomes-sequence/wrapper.py
📚 Learning: 2025-06-04T06:32:20.090Z
Learnt from: rohan-ibn-tariq
Repo: snakemake/snakemake-wrappers PR: 4160
File: bio/trf/wrapper.py:18-26
Timestamp: 2025-06-04T06:32:20.090Z
Learning: For Snakemake wrappers, it's preferable to keep parameter dictionaries and constants directly visible in wrapper.py files rather than importing from config modules, to maintain minimal, self-documenting code that doc viewers can understand at a glance.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-08-21T08:27:20.017Z
Learnt from: MagdalenaZZ
Repo: snakemake/snakemake-wrappers PR: 3071
File: bio/paraphase/wrapper.py:0-0
Timestamp: 2024-08-21T08:27:20.017Z
Learning: In Snakemake wrappers, a try/except block is unnecessary because Snakemake will handle exceptions if the wrapper fails.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T18:30:28.583Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/preprocess-variants/wrapper.py:1-3
Timestamp: 2024-11-15T18:30:28.583Z
Learning: In Snakemake wrappers, there is no need to use `WorkflowError`; plain Python exceptions are sufficient.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-26T10:49:04.406Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:1-10
Timestamp: 2024-11-26T10:49:04.406Z
Learning: In the `snakemake-wrappers` project, it's acceptable to use "master" in the wrapper path instead of pinning to a specific version.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T18:36:04.660Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/call-variants/wrapper.py:15-23
Timestamp: 2024-11-15T18:36:04.660Z
Learning: In the Snakemake wrappers repository, using `shell=True` and redirecting within shell commands is acceptable.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-11-15T18:31:15.447Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:7-10
Timestamp: 2024-11-15T18:31:15.447Z
Learning: In the Snakemake wrappers repository, avoid suggesting refactoring that involves using `tempfile.gettempdir()` or changing output paths to temporary directories.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
📚 Learning: 2024-08-14T15:21:37.230Z
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3115
File: CHANGELOG.md:5-5
Timestamp: 2024-08-14T15:21:37.230Z
Learning: Do not review release-please commits in the Snakemake wrappers repository as they are auto-formatted.
Applied to files:
bio/reference/ensembl-genomes-annotation/wrapper.py
🪛 Ruff (0.14.10)
bio/reference/ensembl-genomes-sequence/wrapper.py
11-11: Undefined name snakemake
(F821)
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
15-15: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
20-22: Avoid specifying long messages outside the exception class
(TRY003)
28-28: Undefined name snakemake
(F821)
34-34: Undefined name snakemake
(F821)
35-35: Undefined name snakemake
(F821)
55-57: Avoid specifying long messages outside the exception class
(TRY003)
61-63: Avoid specifying long messages outside the exception class
(TRY003)
65-65: Undefined name snakemake
(F821)
bio/reference/ensembl-genomes-annotation/wrapper.py
13-13: Undefined name snakemake
(F821)
15-15: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
17-17: Undefined name snakemake
(F821)
18-18: Undefined name snakemake
(F821)
22-24: Avoid specifying long messages outside the exception class
(TRY003)
30-30: Undefined name snakemake
(F821)
32-32: Avoid specifying long messages outside the exception class
(TRY003)
45-47: Avoid specifying long messages outside the exception class
(TRY003)
49-49: Undefined name snakemake
(F821)
65-65: Undefined name snakemake
(F821)
66-66: Undefined name snakemake
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Summary
🔇 Additional comments (6)
bio/reference/ensembl-genomes-annotation/wrapper.py (2)
1-24: LGTM: Clean imports and parameter validation.The module structure, parameter extraction, and division validation are well-implemented. The division check provides a clear error message.
49-72: LGTM: Robust download with HTTPS→FTP fallback.The URL construction and download logic are well-structured:
- Tries HTTPS first with proper fallback to FTP
- Handles both gzipped and uncompressed output correctly
- Provides helpful error message on failure
bio/reference/ensembl-genomes-sequence/wrapper.py (4)
1-27: LGTM: Clean setup and decompression logic.The imports, parameter extraction, and division validation are well-structured. The decompression logic at lines 28-31 correctly handles both gzipped and uncompressed outputs.
54-63: LGTM: Clear datatype and chromosome validation.The validation correctly enforces that chromosome selection is only allowed with
datatype="dna", with a helpful error message.
68-101: LGTM: Well-structured download logic with smart fallbacks.The main loop correctly:
- Checks file existence with HEAD requests before downloading (efficient)
- Falls back from HTTPS to FTP when needed
- Tries alternative suffixes for non-chromosome downloads
- Fails immediately if a requested chromosome is unavailable (per learnings on multi-file downloads)
- Uses append (
>>) to support downloading multiple chromosomes into a single file
103-116: LGTM: Helpful error handling with clear guidance.The final error handling provides actionable guidance, showing attempted URLs and suggesting verification steps. The formatting at line 106 clearly displays alternative suffixes using pipe notation.
| chromosome = snakemake.params.get("chromosome", "") | ||
| if datatype == "dna": | ||
| if chromosome: | ||
| suffixes = [f"dna.chromosome.{chrom}.fa.gz" for chrom in chromosome] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential bug: chromosome parameter may be string instead of list.
If chromosome is passed as a string (e.g., "chr1"), line 38 will iterate over characters, creating incorrect suffixes like ["dna.chromosome.c.fa.gz", "dna.chromosome.h.fa.gz", ...] instead of ["dna.chromosome.chr1.fa.gz"].
🔎 Proposed fix
datatype = snakemake.params.get("datatype", "")
chromosome = snakemake.params.get("chromosome", "")
+if chromosome and isinstance(chromosome, str):
+ chromosome = [chromosome]
if datatype == "dna":
if chromosome:
suffixes = [f"dna.chromosome.{chrom}.fa.gz" for chrom in chromosome]Alternatively, add validation:
chromosome = snakemake.params.get("chromosome", "")
+if chromosome and not isinstance(chromosome, (list, tuple)):
+ raise ValueError("chromosome parameter must be a list or tuple of chromosome names")
if datatype == "dna":🧰 Tools
🪛 Ruff (0.14.10)
35-35: Undefined name snakemake
(F821)
🤖 Prompt for AI Agents
In bio/reference/ensembl-genomes-sequence/wrapper.py around lines 35-38, the
code assumes snakemake.params["chromosome"] is iterable but if it's a string it
will iterate characters; coerce/validate chromosome to a list before iterating
(e.g., treat None/"" as empty, if isinstance(chromosome, str) wrap it in
[chromosome], otherwise ensure it's a list/iterable) and then build suffixes by
iterating that list.
Add wrappers to
bio/referenceto download sequences and annotations from Ensembl Genomes, which curates data for non-vertebrates species. Currently available wrappers inbio/referencesupport only Ensembl, which primarily has vertebrate metazoan data. The release numbering and URLs differ between the two projects. The wrappers are modified frombio/reference/ensembl-sequenceandbio/reference/ensembl-annotationand have been tested locally.QC
snakemake-wrappers.While the contributions guidelines are more extensive, please particularly ensure that:
test.pywas updated to call any added or updated example rules in aSnakefileinput:andoutput:file paths in the rules can be chosen arbitrarilyinput:oroutput:)tempfile.gettempdir()points tometa.yamlcontains a link to the documentation of the respective tool or command underurl:Summary by CodeRabbit
New Features
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.