Skip to content

Commit 1c9dea3

Browse files
fgvieiraVito Zanotelli
authored and
Vito Zanotelli
committed
<!-- Ensure that the PR title follows conventional commit style (<type>: <description>)--> <!-- Possible types are here: https://github.com/commitizen/conventional-commit-types/blob/master/index.json --> <!-- Add a description of your PR here--> Allow for custom URLs (fix issues snakemake#366 and snakemake#2649). ### QC <!-- Make sure that you can tick the boxes below. --> * [x] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine, * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).
1 parent 83e4181 commit 1c9dea3

File tree

13 files changed

+56
-24
lines changed

13 files changed

+56
-24
lines changed

bio/reference/ensembl-annotation/meta.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@ authors:
44
- Johannes Köster
55
output:
66
- Ensemble GTF or GFF3 anotation file
7+
params:
8+
- url: URL from where to download cache data (optional; by default is ``ftp://ftp.ensembl.org/pub``)

bio/reference/ensembl-annotation/test/Snakefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ rule get_annotation_gz:
2525
# branch="plants", # optional: specify branch
2626
log:
2727
"logs/get_annotation.log",
28+
params:
29+
url="http://ftp.ensembl.org/pub",
2830
cache: "omit-software" # save space and time with between workflow caching (see docs)
2931
wrapper:
3032
"master/bio/reference/ensembl-annotation"

bio/reference/ensembl-annotation/wrapper.py

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -48,17 +48,8 @@
4848
)
4949

5050

51-
url = "ftp://ftp.ensembl.org/pub/{branch}release-{release}/{out_fmt}/{species}/{species_cap}.{build}.{gtf_release}.{flavor}{suffix}".format(
52-
release=release,
53-
gtf_release=gtf_release,
54-
build=build,
55-
species=species,
56-
out_fmt=out_fmt,
57-
species_cap=species.capitalize(),
58-
suffix=suffix,
59-
flavor=flavor,
60-
branch=branch,
61-
)
51+
url = snakemake.params.get("url", "ftp://ftp.ensembl.org/pub")
52+
url = f"{url}/{branch}release-{release}/{out_fmt}/{species}/{species.capitalize()}.{build}.{gtf_release}.{flavor}{suffix}"
6253

6354

6455
try:

bio/reference/ensembl-sequence/meta.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,7 @@ name: ensembl-sequence
22
description: Download sequences (e.g. genome) from ENSEMBL FTP servers, and store them in a single .fasta file.
33
authors:
44
- Johannes Köster
5+
output:
6+
- fasta file
7+
params:
8+
- url: URL from where to download cache data (optional; by default is ``ftp://ftp.ensembl.org/pub``)

bio/reference/ensembl-sequence/test/Snakefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ rule get_single_chromosome:
2525
# branch="plants", # optional: specify branch
2626
log:
2727
"logs/get_genome.log",
28+
params:
29+
url="http://ftp.ensembl.org/pub",
2830
cache: "omit-software" # save space and time with between workflow caching (see docs)
2931
wrapper:
3032
"master/bio/reference/ensembl-sequence"

bio/reference/ensembl-sequence/wrapper.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,9 @@
5050
"invalid datatype, to select a single chromosome the datatype must be dna"
5151
)
5252

53+
url = snakemake.params.get("url", "ftp://ftp.ensembl.org/pub")
5354
spec = spec.format(build=build, release=release)
54-
url_prefix = f"ftp://ftp.ensembl.org/pub/{branch}release-{release}/fasta/{species}/{datatype}/{species.capitalize()}.{spec}"
55+
url_prefix = f"{url}/{branch}release-{release}/fasta/{species}/{datatype}/{species.capitalize()}.{spec}"
5556

5657
success = False
5758
for suffix in suffixes:

bio/reference/ensembl-variation/meta.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,7 @@ name: ensembl-variation
22
description: Download known genomic variants from ENSEMBL FTP servers, and store them in a single .vcf.gz file.
33
authors:
44
- Johannes Köster
5+
output:
6+
- VCF file
7+
params:
8+
- url: URL from where to download cache data (optional; by default is ``ftp://ftp.ensembl.org/pub``)

bio/reference/ensembl-variation/test/Snakefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ rule get_variation:
1212
type="all", # one of "all", "somatic", "structural_variation"
1313
# chromosome="21", # optionally constrain to chromosome, only supported for homo_sapiens
1414
# branch="plants", # optional: specify branch
15+
params:
16+
url="http://ftp.ensembl.org/pub",
1517
log:
1618
"logs/get_variation.log",
1719
cache: "omit-software" # save space and time with between workflow caching (see docs)

bio/reference/ensembl-variation/wrapper.py

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -62,16 +62,12 @@
6262

6363
species_filename = species if release >= 91 else species.capitalize()
6464

65+
url = snakemake.params.get("url", "ftp://ftp.ensembl.org/pub")
6566
urls = [
66-
"ftp://ftp.ensembl.org/pub/{branch}release-{release}/variation/vcf/{species}/{species_filename}{suffix}.vcf.gz".format(
67-
release=release,
68-
species=species,
69-
suffix=suffix,
70-
species_filename=species_filename,
71-
branch=branch,
72-
)
67+
f"{url}/{branch}release-{release}/variation/vcf/{species}/{species_filename}{suffix}.vcf.gz"
7368
for suffix in suffixes
7469
]
70+
7571
names = [os.path.basename(url) for url in urls]
7672

7773
try:

bio/vep/cache/meta.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,10 @@ description: Download VEP cache for given species, build and release.
33
url: http://www.ensembl.org/info/docs/tools/vep/index.html
44
authors:
55
- Johannes Köster
6+
output:
7+
- directory to store the VEP cache
8+
params:
9+
- url: URL from where to download cache data (optional; by default is ``ftp://ftp.ensembl.org/pub``)
10+
- species: species to download cache data
11+
- build: build to download cache data
12+
- release: release to download cache data

bio/vep/cache/test/Snakefile

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,18 @@ rule get_vep_cache:
1010
cache: "omit-software" # save space and time with between workflow caching (see docs)
1111
wrapper:
1212
"master/bio/vep/cache"
13+
14+
15+
rule get_vep_cache_ebi:
16+
output:
17+
directory("resources/vep/cache_ebi"),
18+
params:
19+
url="ftp://ftp.ebi.ac.uk/ensemblgenomes/pub/plants",
20+
species="cyanidioschyzon_merolae",
21+
build="ASM9120v1",
22+
release="58",
23+
log:
24+
"logs/vep/cache_ebi.log",
25+
cache: "omit-software" # save space and time with between workflow caching (see docs)
26+
wrapper:
27+
"master/bio/vep/cache"

bio/vep/cache/wrapper.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,24 +9,25 @@
99

1010

1111
extra = snakemake.params.get("extra", "")
12+
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
13+
1214

1315
try:
1416
release = int(snakemake.params.release)
1517
except ValueError:
1618
raise ValueError("The parameter release is supposed to be an integer.")
1719

20+
1821
with tempfile.TemporaryDirectory() as tmpdir:
1922
# We download the cache tarball manually because vep_install does not consider proxy settings (in contrast to curl).
2023
# See https://github.com/bcbio/bcbio-nextgen/issues/1080
21-
vep_dir = "vep" if release >= 97 else "VEP"
24+
cache_url = snakemake.params.get("url", "ftp://ftp.ensembl.org/pub")
2225
cache_tarball = (
2326
f"{snakemake.params.species}_vep_{release}_{snakemake.params.build}.tar.gz"
2427
)
25-
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
28+
vep_dir = "vep" if snakemake.params.get("url") or release >= 97 else "VEP"
2629
shell(
27-
"curl -L ftp://ftp.ensembl.org/pub/release-{snakemake.params.release}/"
28-
"variation/{vep_dir}/{cache_tarball} "
29-
"-o {tmpdir}/{cache_tarball} {log}"
30+
"curl -L {cache_url}/release-{release}/variation/{vep_dir}/{cache_tarball} -o {tmpdir}/{cache_tarball} {log}"
3031
)
3132

3233
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)

test.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5965,6 +5965,11 @@ def test_vep_cache():
59655965
["snakemake", "--cores", "1", "resources/vep/cache", "--use-conda", "-F"],
59665966
)
59675967

5968+
run(
5969+
"bio/vep/cache",
5970+
["snakemake", "--cores", "1", "resources/vep/cache_ebi", "--use-conda", "-F"],
5971+
)
5972+
59685973

59695974
@skip_if_not_modified
59705975
def test_vep_plugins():

0 commit comments

Comments
 (0)