Skip to content

Commit a45763b

Browse files
authored
feat: add aria2c wrapper (#2725)
<!-- Ensure that the PR title follows conventional commit style (<type>: <description>)--> <!-- Possible types are here: https://github.com/commitizen/conventional-commit-types/blob/master/index.json --> <!-- Add a description of your PR here--> Add wrapper for aria2c, since it allows (among others): - download of several protocols (e.g. HTTP/HTTPS, FTP, SFTP, BitTorrent and Metalink) - parallel downloads - automated checksum check - pre-allocate disk space ### QC <!-- Make sure that you can tick the boxes below. --> * [x] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine, * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a new wrapper for the aria2c download utility, supporting multiple checksum types (MD5, SHA-1, SHA-224, SHA-256, SHA-384, SHA-512, Adler32) for file integrity verification. - Added a comprehensive test suite for the aria2c wrapper, including sample checksum files and Snakemake rules to validate downloads with various hash algorithms. - Provided environment and metadata files to ensure reproducible setups and clear tool documentation. - **Tests** - Implemented automated tests to verify the aria2c wrapper's functionality and checksum verification. - **Chores** - Updated GitHub Actions workflow to include an additional storage plugin for Snakemake. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
1 parent 63f5e87 commit a45763b

13 files changed

+309
-1
lines changed

.github/workflows/qc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ jobs:
5252
shell: bash -el {0}
5353
run: |
5454
conda config --set channel_priority strict
55-
conda install -n snakemake -y snakemake-minimal snakemake
55+
conda install -n snakemake -y snakemake snakemake-minimal snakemake-storage-plugin-http
5656
5757
- name: Fetch master
5858
run: |

test_wrappers.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,27 @@ def _run(wrapper, cmd, check_log=None, compare_results_with_expected=None):
134134
return _run
135135

136136

137+
def test_aria2c(run):
138+
run(
139+
"utils/aria2c",
140+
[
141+
"snakemake",
142+
"--cores",
143+
"2",
144+
"--use-conda",
145+
"-F",
146+
"results/file.fas.gz",
147+
"results/file.md5.fas.gz",
148+
"results/file.md5file.fas.gz",
149+
"results/file.sha1file.fas.gz",
150+
"results/file.sha224file.fas.gz",
151+
"results/file.sha256file.fas.gz",
152+
"results/file.sha384file.fas.gz",
153+
"results/file.sha512file.fas.gz",
154+
"results/file.md5fileH.fas.gz",
155+
],
156+
)
157+
137158
def test_miller(run):
138159
run(
139160
"utils/miller",
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# This file may be used to create an environment using:
2+
# $ conda create --name <env> --file <this file>
3+
# platform: linux-64
4+
# created-by: conda 25.3.1
5+
@EXPLICIT
6+
https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2#d7c89558ba9fa0495403155b64376d81
7+
https://conda.anaconda.org/conda-forge/noarch/ca-certificates-2025.4.26-hbd8a1cb_0.conda#95db94f75ba080a22eb623590993167b
8+
https://conda.anaconda.org/conda-forge/linux-64/libgomp-14.2.0-h767d61c_2.conda#06d02030237f4d5b3d9a7e7d348fe3c6
9+
https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-2_gnu.tar.bz2#73aaf86a425cc6e73fcf236a5a46396d
10+
https://conda.anaconda.org/conda-forge/linux-64/libgcc-14.2.0-h767d61c_2.conda#ef504d1acbd74b7cc6849ef8af47dd03
11+
https://conda.anaconda.org/conda-forge/linux-64/c-ares-1.34.5-hb9d3cd8_0.conda#f7f0d6cc2dc986d42ac2689ec88192be
12+
https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-14.2.0-h69a702a_2.conda#a2222a6ada71fb478682efe483ce0f92
13+
https://conda.anaconda.org/conda-forge/linux-64/libiconv-1.18-h4ce23a2_1.conda#e796ff8ddc598affdf7c173d6145f087
14+
https://conda.anaconda.org/conda-forge/linux-64/liblzma-5.8.1-hb9d3cd8_0.conda#0e87378639676987af32fee53ba32258
15+
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-14.2.0-h8f9b012_2.conda#a78c856b6dc6bf4ea8daeb9beaaa3fb0
16+
https://conda.anaconda.org/conda-forge/linux-64/libzlib-1.3.1-hb9d3cd8_2.conda#edb0dca6bc32e4f4789199455a1dbeb8
17+
https://conda.anaconda.org/conda-forge/linux-64/openssl-3.5.0-h7b32b05_0.conda#bb539841f2a3fde210f387d00ed4bb9d
18+
https://conda.anaconda.org/conda-forge/linux-64/libsqlite-3.49.1-hee588c1_2.conda#962d6ac93c30b1dfc54c9cccafd1003e
19+
https://conda.anaconda.org/conda-forge/linux-64/libssh2-1.11.1-hcf80075_0.conda#eecce068c7e4eddeb169591baac20ac4
20+
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-14.2.0-h4852527_2.conda#c75da67f045c2627f59e6fcb5f4e3a9b
21+
https://conda.anaconda.org/conda-forge/linux-64/libxml2-2.13.7-h81593ed_1.conda#0619e8fc4c8025a908ea3a3422d3b775
22+
https://conda.anaconda.org/conda-forge/linux-64/aria2-1.37.0-hbc8128a_2.conda#03b8874fa70df577f3eee53085d025cf

utils/aria2c/environment.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
channels:
2+
- conda-forge
3+
- nodefaults
4+
dependencies:
5+
- aria2 =1.37.0

utils/aria2c/meta.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
name: aria2
2+
url: https://github.com/aria2/aria2/
3+
description: >
4+
aria2 is a lightweight multi-protocol & multi-source, cross platform download utility operated in command-line. It supports HTTP/HTTPS, FTP, SFTP, BitTorrent and Metalink.
5+
authors:
6+
- Filipe G. Vieira
7+
output:
8+
- Path to downloaded file
9+
params:
10+
- url: URL to download from
11+
- extra: Optional arguments for `aria2c`
12+
- type: type of hash, where `type in ["sha-1", "sha-224", "sha-256", "sha-384", "sha-512", "md5", "adler32"]`
13+
notes: |
14+
* Checksum input file only supported for single-file downloads
15+
* Requires `snakemake >=9.3.1`
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
04c1275ff9c9d0fb595b7482a1d54438 ./annotation_hashes.txt
2+
3413f40db67f8ea3b3a193c2fd663a6e ./GCF_000869925.1_ViralProj17181_assembly_report.txt
3+
7d45362bb87770fac4716b60055fd72d ./GCF_000869925.1_ViralProj17181_assembly_stats.txt
4+
3e2e82ee2bd94c18d92891211eafdf18 ./GCF_000869925.1_ViralProj17181_cds_from_genomic.fna.gz
5+
e673fed3417f2f694b99f9cab1dad83e ./GCF_000869925.1_ViralProj17181_feature_count.txt
6+
c5a292890d71b35ddd4b2366d06cdeb6 ./GCF_000869925.1_ViralProj17181_feature_table.txt.gz
7+
42aa93c5bfdba6ac09a4822a4407b572 ./GCF_000869925.1_ViralProj17181_genomic.fna.gz
8+
a2e1b9686fcbdd4c4059c0ee4c03851a ./GCF_000869925.1_ViralProj17181_genomic.gbff.gz
9+
4276f72895f3436e6826424d1b908d20 ./GCF_000869925.1_ViralProj17181_genomic.gff.gz
10+
81499b53906a29cebea4e472e8ffe842 ./GCF_000869925.1_ViralProj17181_genomic.gtf.gz
11+
a3f486d02206a33e0d17f79d11807f0d ./GCF_000869925.1_ViralProj17181_protein.faa.gz
12+
7c30a6c03dbc7402ce0872afb0ec9e94 ./GCF_000869925.1_ViralProj17181_protein.gpff.gz
13+
cdbfa4db0d86580a730f0829b9ca2151 ./GCF_000869925.1_ViralProj17181_translated_cds.faa.gz
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
30004da6fc9f681d59c6c92cc99c9331622fb1f5 GCF_000869925.1_ViralProj17181_genomic.fna.gz
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
ac2d83823e2adc6b7b38e8dda0b7ff9c2536e62d96dec77e68cf0147 GCF_000869925.1_ViralProj17181_genomic.fna.gz
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
337dad2a0047dde05c24d5ae83fe175f762212e2e50a9494e54f43f9ebd508bd GCF_000869925.1_ViralProj17181_genomic.fna.gz
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
0171910ac0f8c881e24ac5054c734eb295fe73c3a6ad0857eab9349446949a96c45095241ae8d63f25c16a4c1e37c30a GCF_000869925.1_ViralProj17181_genomic.fna.gz
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
265fd46dea811ddebf549bb38fe7f5532308a6f97b62a93cccc6cbdf2fd09e0f3e928745a1b775889f43717593ae9afb9658821be684cdfe42006b9c6592ad41 GCF_000869925.1_ViralProj17181_genomic.fna.gz

utils/aria2c/test/Snakefile

Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
2+
rule test_aria2:
3+
output:
4+
"results/file.fas.gz",
5+
log:
6+
"logs/aria2.log",
7+
params:
8+
url="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/869/925/GCF_000869925.1_ViralProj17181/GCF_000869925.1_ViralProj17181_genomic.fna.gz",
9+
extra="--file-allocation none --retry-wait 5 --console-log-level warn --log-level notice",
10+
threads: 2
11+
resources:
12+
mem_mb=1024,
13+
runtime=30,
14+
wrapper:
15+
"master/utils/aria2c"
16+
17+
18+
rule test_aria2_md5:
19+
output:
20+
"results/file.md5.fas.gz",
21+
log:
22+
"logs/aria2.md5.log",
23+
params:
24+
url="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/869/925/GCF_000869925.1_ViralProj17181/GCF_000869925.1_ViralProj17181_genomic.fna.gz",
25+
md5="42aa93c5bfdba6ac09a4822a4407b572",
26+
extra="--file-allocation none --retry-wait 5 --console-log-level warn --log-level notice",
27+
threads: 2
28+
resources:
29+
mem_mb=1024,
30+
runtime=30,
31+
wrapper:
32+
"master/utils/aria2c"
33+
34+
35+
rule test_aria2_md5fileH:
36+
input:
37+
storage.http(
38+
"https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/869/925/GCF_000869925.1_ViralProj17181/md5checksums.txt"
39+
),
40+
output:
41+
"results/file.md5fileH.fas.gz",
42+
log:
43+
"logs/aria2.md5fileH.log",
44+
params:
45+
url="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/869/925/GCF_000869925.1_ViralProj17181/GCF_000869925.1_ViralProj17181_genomic.fna.gz",
46+
md5=parse_input(
47+
input[0],
48+
parser=extract_checksum,
49+
file="GCF_000869925.1_ViralProj17181_genomic.fna.gz",
50+
),
51+
extra="--file-allocation none --retry-wait 5 --console-log-level warn --log-level notice",
52+
threads: 2
53+
resources:
54+
mem_mb=1024,
55+
runtime=30,
56+
wrapper:
57+
"master/utils/aria2c"
58+
59+
60+
rule test_aria2_md5file:
61+
input:
62+
checksum="GCF_000869925.1_ViralProj17181.md5",
63+
output:
64+
"results/file.md5file.fas.gz",
65+
log:
66+
"logs/aria2.md5file.log",
67+
params:
68+
url="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/869/925/GCF_000869925.1_ViralProj17181/GCF_000869925.1_ViralProj17181_genomic.fna.gz",
69+
extra="--file-allocation none --retry-wait 5 --console-log-level warn --log-level notice",
70+
md5=parse_input(
71+
input[0],
72+
parser=extract_checksum,
73+
file="GCF_000869925.1_ViralProj17181_genomic.fna.gz",
74+
),
75+
threads: 2
76+
resources:
77+
mem_mb=1024,
78+
runtime=30,
79+
wrapper:
80+
"master/utils/aria2c"
81+
82+
83+
rule test_aria2_sha1file:
84+
input:
85+
checksum="GCF_000869925.1_ViralProj17181.sha-1",
86+
output:
87+
"results/file.sha1file.fas.gz",
88+
log:
89+
"logs/aria2.sha1file.log",
90+
params:
91+
url="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/869/925/GCF_000869925.1_ViralProj17181/GCF_000869925.1_ViralProj17181_genomic.fna.gz",
92+
extra="--file-allocation none --retry-wait 5 --console-log-level warn --log-level notice",
93+
sha1=parse_input(
94+
input[0],
95+
parser=extract_checksum,
96+
file="GCF_000869925.1_ViralProj17181_genomic.fna.gz",
97+
),
98+
threads: 2
99+
resources:
100+
mem_mb=1024,
101+
runtime=30,
102+
wrapper:
103+
"master/utils/aria2c"
104+
105+
106+
rule test_aria2_sha224file:
107+
input:
108+
checksum="GCF_000869925.1_ViralProj17181.sha-224",
109+
output:
110+
"results/file.sha224file.fas.gz",
111+
log:
112+
"logs/aria2.sha224file.log",
113+
params:
114+
url="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/869/925/GCF_000869925.1_ViralProj17181/GCF_000869925.1_ViralProj17181_genomic.fna.gz",
115+
extra="--file-allocation none --retry-wait 5 --console-log-level warn --log-level notice",
116+
sha224=parse_input(
117+
input[0],
118+
parser=extract_checksum,
119+
file="GCF_000869925.1_ViralProj17181_genomic.fna.gz",
120+
),
121+
threads: 2
122+
resources:
123+
mem_mb=1024,
124+
runtime=30,
125+
wrapper:
126+
"master/utils/aria2c"
127+
128+
129+
rule test_aria2_sha256file:
130+
input:
131+
checksum="GCF_000869925.1_ViralProj17181.sha-256",
132+
output:
133+
"results/file.sha256file.fas.gz",
134+
log:
135+
"logs/aria2.sha256file.log",
136+
params:
137+
url="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/869/925/GCF_000869925.1_ViralProj17181/GCF_000869925.1_ViralProj17181_genomic.fna.gz",
138+
extra="--file-allocation none --retry-wait 5 --console-log-level warn --log-level notice",
139+
sha256=parse_input(
140+
input[0],
141+
parser=extract_checksum,
142+
file="GCF_000869925.1_ViralProj17181_genomic.fna.gz",
143+
),
144+
threads: 2
145+
resources:
146+
mem_mb=1024,
147+
runtime=30,
148+
wrapper:
149+
"master/utils/aria2c"
150+
151+
152+
rule test_aria2_sha384file:
153+
input:
154+
checksum="GCF_000869925.1_ViralProj17181.sha-384",
155+
output:
156+
"results/file.sha384file.fas.gz",
157+
log:
158+
"logs/aria2.sha384file.log",
159+
params:
160+
url="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/869/925/GCF_000869925.1_ViralProj17181/GCF_000869925.1_ViralProj17181_genomic.fna.gz",
161+
extra="--file-allocation none --retry-wait 5 --console-log-level warn --log-level notice",
162+
sha384=parse_input(
163+
input[0],
164+
parser=extract_checksum,
165+
file="GCF_000869925.1_ViralProj17181_genomic.fna.gz",
166+
),
167+
threads: 2
168+
resources:
169+
mem_mb=1024,
170+
runtime=30,
171+
wrapper:
172+
"master/utils/aria2c"
173+
174+
175+
rule test_aria2_sha512file:
176+
input:
177+
checksum="GCF_000869925.1_ViralProj17181.sha-512",
178+
output:
179+
"results/file.sha512file.fas.gz",
180+
log:
181+
"logs/aria2.sha512file.log",
182+
params:
183+
url="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/869/925/GCF_000869925.1_ViralProj17181/GCF_000869925.1_ViralProj17181_genomic.fna.gz",
184+
extra="--file-allocation none --retry-wait 5 --console-log-level warn --log-level notice",
185+
sha512=parse_input(
186+
input[0],
187+
parser=extract_checksum,
188+
file="GCF_000869925.1_ViralProj17181_genomic.fna.gz",
189+
),
190+
threads: 2
191+
resources:
192+
mem_mb=1024,
193+
runtime=30,
194+
wrapper:
195+
"master/utils/aria2c"

utils/aria2c/wrapper.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
__author__ = "Filipe G. Vieira"
2+
__copyright__ = "Copyright 2023, Filipe G. Vieira"
3+
__license__ = "MIT"
4+
5+
from snakemake.shell import shell
6+
7+
extra = snakemake.params.get("extra", "")
8+
9+
for hash_function, digest in snakemake.params.items():
10+
if hash_function in [
11+
"sha1",
12+
"sha224",
13+
"sha256",
14+
"sha384",
15+
"sha512",
16+
"md5",
17+
"adler32",
18+
]:
19+
if hash_function.startswith("sha"):
20+
hash_function = hash_function.replace("sha", "sha-")
21+
extra += f" --checksum {hash_function}={digest}"
22+
break
23+
24+
shell(
25+
"aria2c"
26+
" --max-concurrent-downloads {snakemake.threads}"
27+
" {extra}"
28+
" --log {snakemake.log}"
29+
" --out {snakemake.output[0]}"
30+
" {snakemake.params.url}"
31+
" > /dev/null"
32+
)

0 commit comments

Comments
 (0)