-
Notifications
You must be signed in to change notification settings - Fork 194
feat: add aria2c wrapper #2725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add aria2c wrapper #2725
Conversation
I've included the function On another issue, one test is currently disabled since |
Feel free to add snakemake-storage-plugin-http to the test env. |
<!-- Ensure that the PR title follows conventional commit style (<type>: <description>)--> <!-- Possible types are here: https://github.com/commitizen/conventional-commit-types/blob/master/index.json --> <!-- Add a description of your PR here--> Add `snakemake-storage-plugin-http` plugin to test environment for `aria2c` test (#2725). ### QC <!-- Make sure that you can tick the boxes below. --> * [x] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine, * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).
<!--Add a description of your PR here--> Add helper functions to parse input files. See example in snakemake/snakemake-wrappers#2725. ### QC <!-- Make sure that you can tick the boxes below. --> * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Added integration of Xonsh scripts into Snakemake workflows. - Introduced a new rule for executing Python scripts within a Conda environment. - Enhanced report generation capabilities, including self-contained HTML and ZIP archives. - Expanded input handling capabilities with new methods for parsing inputs and extracting checksums. - **Bug Fixes** - Improved checksum extraction and validation processes in existing workflows. - **Documentation** - Expanded documentation to include details on Xonsh integration and report generation features. - Clarified usage of conda environments and apptainer integration within Snakemake. - **Tests** - Added new test cases for Xonsh script execution and Conda deployment scenarios. - Updated existing tests to validate checksum functionality and improve input handling. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Johannes Köster <[email protected]>
Pull request was converted to draft
Warning Rate limit exceeded@fgvieira has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 3 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe changes introduce a new Snakemake wrapper for the Changes
Sequence Diagram(s)sequenceDiagram
participant TestRunner as test_aria2c (pytest)
participant Snakemake as Snakemake
participant Wrapper as aria2c wrapper.py
participant Aria2c as aria2c (CLI)
participant NCBI as NCBI (data source)
participant ChecksumFile as Checksum File
TestRunner->>Snakemake: Run test workflow (Snakefile)
Snakemake->>Wrapper: Execute aria2c rule with parameters (url, extra, type)
Wrapper->>ChecksumFile: (If type specified) Read checksum value
Wrapper->>Aria2c: Run aria2c with download URL and checksum args
Aria2c->>NCBI: Download file
Aria2c->>Wrapper: Return status/logs
Wrapper->>Snakemake: Output file and logs
Snakemake->>TestRunner: Test result
sequenceDiagram
participant User as User
participant Snakemake as Snakemake
participant Wrapper as aria2c wrapper.py
participant Aria2c as aria2c (CLI)
User->>Snakemake: Specify rule with url, extra, and type
Snakemake->>Wrapper: Pass parameters
Wrapper->>Aria2c: Build and execute aria2c command (with checksum if type provided)
Aria2c-->>Wrapper: Download file, verify checksum
Wrapper-->>Snakemake: Output file path
Suggested reviewers
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
.github/workflows/qc.yml
(1 hunks)bio/ngscheckmate/ncm/wrapper.py
(1 hunks)test_wrappers.py
(1 hunks)utils/aria2c/environment.linux-64.pin.txt
(1 hunks)utils/aria2c/environment.yaml
(1 hunks)utils/aria2c/meta.yaml
(1 hunks)utils/aria2c/test/GCF_000869925.1_ViralProj17181.md5
(1 hunks)utils/aria2c/test/GCF_000869925.1_ViralProj17181.sha-1
(1 hunks)utils/aria2c/test/GCF_000869925.1_ViralProj17181.sha-224
(1 hunks)utils/aria2c/test/GCF_000869925.1_ViralProj17181.sha-256
(1 hunks)utils/aria2c/test/GCF_000869925.1_ViralProj17181.sha-384
(1 hunks)utils/aria2c/test/GCF_000869925.1_ViralProj17181.sha-512
(1 hunks)utils/aria2c/test/Snakefile
(1 hunks)utils/aria2c/wrapper.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
bio/ngscheckmate/ncm/wrapper.py
utils/aria2c/wrapper.py
test_wrappers.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.
bio/ngscheckmate/ncm/wrapper.py
utils/aria2c/wrapper.py
🪛 Ruff (0.8.2)
utils/aria2c/wrapper.py
7-7: Undefined name snakemake
(F821)
9-9: Undefined name snakemake
(F821)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: testing
🔇 Additional comments (18)
bio/ngscheckmate/ncm/wrapper.py (1)
111-111
: Use inline environment variable assignment for NCM_REF
Switching to an inline assignment (NCM_REF={snakemake.input.fasta:q}
) simplifies the shell invocation by removing the separateexport
and&&
. This maintains identical behavior with a cleaner command.utils/aria2c/test/GCF_000869925.1_ViralProj17181.sha-1 (1)
1-1
: Checksum file format is correct.
The SHA-1 hash is exactly 40 hexadecimal characters, followed by two spaces and the target filename, matching the expected format for checksum verification.utils/aria2c/test/GCF_000869925.1_ViralProj17181.sha-256 (1)
1-1
: Checksum file format is correct.
The SHA-256 hash is 64 hexadecimal characters, correctly followed by two spaces and the filename, adhering to standard checksum file conventions.utils/aria2c/test/GCF_000869925.1_ViralProj17181.sha-224 (1)
1-1
: Checksum file format is correct.
The SHA-224 hash is 56 hexadecimal characters long, followed by two spaces and the filename, consistent with expected formatting.utils/aria2c/test/GCF_000869925.1_ViralProj17181.sha-512 (1)
1-1
: Checksum file format is correct.
The SHA-512 hash is 128 hexadecimal characters, properly followed by two spaces and the filename, matching checksum file standards.utils/aria2c/test/GCF_000869925.1_ViralProj17181.sha-384 (1)
1-1
: Checksum file format is correct.
The SHA-384 hash is 96 hexadecimal characters in length and is followed by two spaces and the filename, correctly formatted for verification.utils/aria2c/environment.yaml (1)
1-6
: Environment configuration looks goodThe environment correctly specifies conda-forge channel with nodefaults and pins the aria2 package to a specific version (1.37.0). This follows best practices for creating reproducible conda environments.
.github/workflows/qc.yml (1)
55-55
: Proper addition of HTTP storage plugin for testingThe workflow now installs the snakemake-storage-plugin-http package in the CI environment, which aligns with the PR objectives to enable testing of the aria2c wrapper that requires HTTP functionality.
utils/aria2c/test/GCF_000869925.1_ViralProj17181.md5 (1)
1-14
: Test checksums are properly formattedThe MD5 checksums are correctly formatted with two space-separated columns (hash and filename) for verifying downloaded files integrity. This will provide good test coverage for the aria2c wrapper's checksum verification functionality.
test_wrappers.py (1)
137-157
: Well-structured test for the aria2c wrapperThe test function for aria2c follows the established pattern in the codebase and provides comprehensive testing of various features including different checksum verification methods (MD5, SHA1, SHA224, SHA256, SHA384, SHA512). The test appropriately uses 2 cores and forces execution with the
-F
flag.utils/aria2c/environment.linux-64.pin.txt (1)
1-22
: Environment specification looks good and follows best practicesThe pinned environment file correctly specifies exact package versions with hashes for reproducibility. It includes all necessary dependencies for the aria2c wrapper, with the aria2 package from conda-forge (version 1.37.0) and relevant system libraries.
utils/aria2c/wrapper.py (2)
9-22
: Good implementation of hash function detection and formattingThe implementation correctly handles all supported hash functions and properly reformats "sha" prefixed functions to "sha-" as required by aria2c. The break statement ensures that only the first matching hash function is used, as aria2c only accepts one checksum.
🧰 Tools
🪛 Ruff (0.8.2)
9-9: Undefined name
snakemake
(F821)
24-32
: Well-structured shell command with appropriate parametersThe shell command properly utilizes threading, logging, and command-line parameters. Output redirection to /dev/null is appropriate since the log file already captures the relevant information.
utils/aria2c/meta.yaml (2)
1-12
: Metadata is well-structured and comprehensiveThe metadata file properly documents the wrapper's name, URL, description, author, outputs, and parameters. The hash type description accurately lists all supported hash algorithms.
13-15
: Notes provide important usage informationThe notes correctly highlight that checksum verification is only supported for single-file downloads and specify the minimum required Snakemake version.
utils/aria2c/test/Snakefile (3)
2-15
: Basic download test is well-configuredThe basic download test without checksum verification is properly set up with appropriate output, logging, URL, and resource specifications.
18-32
: Explicit MD5 checksum test looks correctThis test properly demonstrates using an explicit MD5 checksum parameter for verification during download.
60-196
: Comprehensive hash algorithm tests cover all supported formatsThe test suite thoroughly tests all supported hash algorithms (MD5, SHA1, SHA224, SHA256, SHA384, SHA512) with consistent parameter structures and resource specifications. Each test correctly uses the
parse_input
function to extract checksums from the respective input files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
utils/aria2c/wrapper.py (1)
9-32
: Consider validating required URL parameter.While the code functions correctly, it assumes the "url" parameter is always provided. Adding validation for this required parameter would improve error handling.
from snakemake.shell import shell extra = snakemake.params.get("extra", "") +# Ensure URL parameter is provided +if not hasattr(snakemake.params, "url") or not snakemake.params.url: + raise ValueError("Please provide a URL parameter for aria2c wrapper") + for hash_function, digest in snakemake.params.items():🧰 Tools
🪛 Ruff (0.8.2)
9-9: Undefined name
snakemake
(F821)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
test_wrappers.py
(1 hunks)utils/aria2c/environment.linux-64.pin.txt
(1 hunks)utils/aria2c/environment.yaml
(1 hunks)utils/aria2c/test/Snakefile
(1 hunks)utils/aria2c/wrapper.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- utils/aria2c/environment.yaml
- utils/aria2c/environment.linux-64.pin.txt
- test_wrappers.py
- utils/aria2c/test/Snakefile
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
utils/aria2c/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.
utils/aria2c/wrapper.py
🪛 Ruff (0.8.2)
utils/aria2c/wrapper.py
7-7: Undefined name snakemake
(F821)
9-9: Undefined name snakemake
(F821)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: testing
🔇 Additional comments (5)
utils/aria2c/wrapper.py (5)
1-4
: LGTM! Well-documented authorship and licensing.The header contains appropriate author information, copyright, and license details following standard practices.
5-6
: LGTM! Appropriate import for Snakemake integration.Correctly imports the shell function from snakemake.shell module to execute the aria2c command.
7-8
: LGTM! Properly handles optional extra parameters.Correctly extracts the "extra" parameter from snakemake.params with an empty string default.
🧰 Tools
🪛 Ruff (0.8.2)
7-7: Undefined name
snakemake
(F821)
9-23
: LGTM! Well-structured checksum handling.The code efficiently detects supported hash functions and properly formats them for aria2c compatibility. Good use of the break statement to use only the first matching hash function.
🧰 Tools
🪛 Ruff (0.8.2)
9-9: Undefined name
snakemake
(F821)
24-32
: LGTM! Comprehensive aria2c command construction.The shell command is well-structured with:
- Thread-based concurrency control
- Proper parameter inclusion
- Appropriate logging configuration
- Output file specification
- URL parameter usage
Ping @johanneskoester |
🤖 I have created a release *beep* *boop* --- ## [6.2.0](v6.1.1...v6.2.0) (2025-05-19) ### Features * add aria2c wrapper ([#2725](#2725)) ([a45763b](a45763b)) * bwameth mem and mem2 ([#3728](#3728)) ([63f5e87](63f5e87)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Add wrapper for aria2c, since it allows (among others):
QC
For all wrappers added by this PR,
input:
andoutput:
file paths in the resulting rule can be changed arbitrarily,threads: x
statement withx
being a reasonable default,map_reads
for a step that maps reads),environment.yaml
specifications follow the respective best practices,environment.yaml
pinning has been updated by runningsnakedeploy pin-conda-envs environment.yaml
on a linux machine,input:
oroutput:
),Snakefile
s and their entries are explained via comments (input:
/output:
/params:
etc.),stderr
and/orstdout
are logged correctly (log:
), depending on the wrapped tool,tempfile.gettempdir()
points to (see here; this also means that using any Pythontempfile
default behavior works),meta.yaml
contains a link to the documentation of the respective tool or command,Snakefile
s pass the linting (snakemake --lint
),Snakefile
s are formatted with snakefmt,Summary by CodeRabbit
New Features
Tests
Chores