Skip to content

Commit 63e45a7

Browse files
feat: add helper functions to parse input files (#2918)
<!--Add a description of your PR here--> Add helper functions to parse input files. See example in snakemake/snakemake-wrappers#2725. ### QC <!-- Make sure that you can tick the boxes below. --> * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Added integration of Xonsh scripts into Snakemake workflows. - Introduced a new rule for executing Python scripts within a Conda environment. - Enhanced report generation capabilities, including self-contained HTML and ZIP archives. - Expanded input handling capabilities with new methods for parsing inputs and extracting checksums. - **Bug Fixes** - Improved checksum extraction and validation processes in existing workflows. - **Documentation** - Expanded documentation to include details on Xonsh integration and report generation features. - Clarified usage of conda environments and apptainer integration within Snakemake. - **Tests** - Added new test cases for Xonsh script execution and Conda deployment scenarios. - Updated existing tests to validate checksum functionality and improve input handling. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Johannes Köster <[email protected]>
1 parent 9a6d14b commit 63e45a7

File tree

7 files changed

+96
-6
lines changed

7 files changed

+96
-6
lines changed

docs/snakefiles/rules.rst

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -547,6 +547,49 @@ It can for example be used to condition some behavior in the workflow on the exi
547547
shell:
548548
"cp {input} {output}"
549549
550+
.. _snakefiles-semantic-helpers-parse-input:
551+
552+
The parse_input function
553+
""""""""""""""""""""""""
554+
555+
The ``parse_input`` function allows to parse an input file and return a value.
556+
It has the signature ``parse_input(input_item, parser, kwargs)``, with ``input_item`` being the key of an input file, ``parser`` being a callable to extract the desired information, and ``kwargs`` extra arguments passed to the parser.
557+
The function will return the extracted value.
558+
It can for example be used to extract a value from inside an input file.
559+
560+
.. code-block:: python
561+
562+
rule a:
563+
input:
564+
samples="samples.tsv",
565+
output:
566+
"samples.id",
567+
params:
568+
id=parse_input(input.samples, parser=extract_id)
569+
shell:
570+
"echo {params.id} > {output}"
571+
572+
573+
.. _snakefiles-semantic-helpers-extract-checksum:
574+
575+
The extract_checksum function
576+
"""""""""""""""""""""""""""""
577+
578+
The ``extract_checksum`` function parses an input file and returns the checksum of the given file.
579+
It has the signature ``extract_checksum(infile, file)``, with ``infile`` being the input file, and ``file`` the filename to search for.
580+
The function will return the checksum of ``file`` present in ``infile``.
581+
582+
.. code-block:: python
583+
584+
rule a:
585+
input:
586+
checksum="samples.md5",
587+
output:
588+
tsv="{a}.tsv",
589+
params:
590+
checksum=parse_input(input.checksum, parser=extract_checksum, file=output.tsv)
591+
shell:
592+
"echo {params.checksum} > {output}"
550593
551594
.. _snakefiles-rule-item-access:
552595

@@ -3146,4 +3189,4 @@ The name is optional and can be left out, creating an anonymous rule. It can als
31463189
Note that any placeholders in the shell command (like ``{input}``) are always evaluated and replaced
31473190
when the corresponding job is executed, even if they are occurring inside a comment.
31483191
To avoid evaluation and replacement, you have to mask the braces by doubling them,
3149-
i.e. ``{{input}}``.
3192+
i.e. ``{{input}}``.

snakemake/ioutils/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from snakemake.ioutils.lookup import lookup
66
from snakemake.ioutils.rule_items_proxy import rule_item_factory
77
from snakemake.ioutils.subpath import subpath
8+
from snakemake.ioutils.input import parse_input, extract_checksum
89

910

1011
def register_in_globals(_globals):
@@ -21,5 +22,7 @@ def register_in_globals(_globals):
2122
"resources": rule_item_factory("resources"),
2223
"threads": rule_item_factory("threads"),
2324
"subpath": subpath,
25+
"parse_input": parse_input,
26+
"extract_checksum": extract_checksum,
2427
}
2528
)

snakemake/ioutils/input.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
def parse_input(infile, parser, **kwargs):
2+
def inner(wildcards, input, output):
3+
with open(infile, "r") as fh:
4+
if parser is None:
5+
return fh.read().strip()
6+
else:
7+
return parser(fh, **kwargs)
8+
9+
return inner
10+
11+
12+
def extract_checksum(infile, **kwargs):
13+
try:
14+
import pandas as pd
15+
16+
fix_file_name = lambda x: x.removeprefix("./")
17+
return (
18+
pd.read_csv(
19+
infile,
20+
sep=" ",
21+
header=None,
22+
engine="python",
23+
converters={1: fix_file_name},
24+
)
25+
.set_index(1)
26+
.loc[fix_file_name(kwargs.get("file"))]
27+
.item()
28+
)
29+
except ImportError:
30+
raise WorkflowError("Pandas is required to extract checksum from file.")

tests/test_ioutils/Snakefile

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ configfile: "config.yaml"
88

99
assert lookup(dpath="does/not/exist", within=config, default=None) is None
1010
assert lookup(dpath="does/not/exist", within=config, default=5) == 5
11+
assert (
12+
extract_checksum("samples.md5", file="1.tsv") == "9695eb6f38992d796551f4cb20d7d138"
13+
)
1114

1215

1316
rule all:
@@ -16,10 +19,16 @@ rule all:
1619

1720

1821
rule a:
22+
input:
23+
checksum="samples.md5",
1924
output:
2025
"a/{sample}.txt",
26+
params:
27+
checksum=lambda w, input, output: parse_input(
28+
input.checksum, parser=extract_checksum, file=f"{w.sample}.tsv"
29+
),
2130
shell:
22-
"echo a > {output}"
31+
"echo {params.checksum} > {output}"
2332

2433

2534
rule b:
@@ -46,7 +55,7 @@ rule c:
4655

4756
rule item_access:
4857
input:
49-
txt="in.txt"
58+
txt="in.txt",
5059
output:
5160
txt="test.txt",
5261
params:
@@ -73,4 +82,4 @@ rule e:
7382
output:
7483
"results/switch~{switch}.column~{col}.txt",
7584
shell:
76-
"cat {input} > {output}"
85+
"cat {input} > {output}"
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
a
1+
9695eb6f38992d796551f4cb20d7d138
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
a
1+
9695eb6f38992d796551f4cb20d7d138
22
b
33
d

tests/test_ioutils/samples.md5

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
5695eb6f38992d796551f4cb20d7d138 2.tsv
2+
6695eb6f38992d796551f4cb20d7d138 3.tsv
3+
7695eb6f38992d796551f4cb20d7d138 prefix_1.tsv
4+
8695eb6f38992d796551f4cb20d7d138 1_suffix.tsv
5+
9695eb6f38992d796551f4cb20d7d138 1.tsv

0 commit comments

Comments
 (0)