-
Notifications
You must be signed in to change notification settings - Fork 7
Expand file tree
/
Copy pathindex.qmd
More file actions
309 lines (225 loc) · 10.2 KB
/
index.qmd
File metadata and controls
309 lines (225 loc) · 10.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
---
title: Viash Code Block
order: 40
---
{{< include ../../_includes/_language_chooser.qmd >}}
**Example:**
```{r setup, include=FALSE}
repo_path <- system("git rev-parse --show-toplevel", intern = TRUE)
source(paste0(repo_path, "/_includes/_r_helper.R"))
source(paste0(repo_path, "/guide/component/_language_examples.R"))
# escape languages
langs <- langs %>%
mutate(label = gsub("#", "\\\\#", label))
```
When running a Viash component with `viash run`, Viash will wrap your script into a Bash executable. In doing so, it strips away the "Viash placeholder" code block and replaces it by a bit of code to your script for reading any parameter values at runtime.
## Recognizing the Viash placeholder code block
```{r setup-config-inject, include=FALSE}
temp_dir <- tempfile("config_inject")
dir.create(temp_dir, recursive = TRUE, showWarnings = FALSE)
on.exit(unlink(temp_dir, recursive = TRUE), add = TRUE)
langs <- langs %>%
mutate(
config_path = paste0(temp_dir, "/", id, "/", basename(example_config)),
script_path = paste0(temp_dir, "/", id, "/", basename(example_script))
)
pwalk(langs, function(id, label, example_config, example_script, config_path, script_path, ...) {
dir.create(paste0(temp_dir, "/", id), recursive = TRUE, showWarnings = FALSE)
file.copy(example_config, config_path)
file.copy(example_script, script_path)
})
```
::: {.panel-tabset}
```{r show-placeholder, echo=FALSE, output="asis"}
pwalk(langs, function(id, label, config_path, script_path, ...) {
qrt(
"## {% label %}
|
|```{embed, lang='{%id%}'}
|{%script_path%}
|```
|")
})
```
:::
A "Viash placeholder" code block is the section between the `VIASH START` and `VIASH END` comments.
## What happens at runtime
By passing arguments to the component, Viash will add your parameter values to your script by replacing the Viash placeholder code block. If no such code block exists yet, the parameters are inserted at the top of the file.
The resulting code block will contain two maps (or dictionaries): `par` and `meta`. The `par` map contains the parameter values specified by the user, and `meta` contains additional information on the current runtime environment. Note that for Bash scripts, the `par` and `meta` maps are flattened into separate environment variables.
## Previewing the `par` and `meta` objects
To get insight into how `par` and `meta` are defined, you can run [`viash config inject`](/reference/cli/config_inject.qmd) to replace the current parameter placeholder with an auto-generated parameter placeholder.
::: {.callout-warning}
This will change the contents of your script!
:::
::: {.panel-tabset}
```{r config-inject, echo=FALSE, output="asis"}
pwalk(langs, function(id, label, config_path, script_path, ...) {
qrt(
"## {% label %}
|
|Running `viash config inject` effectively changes the contents of the script.
|
|```{bash config-inject}
|viash config inject {%basename(config_path)%}
|```
|
|The updated `{%basename(script_path)%}` now contains the following code:
|
|```{embed, lang='{%id%}'}
|{%script_path%}
|```
|", .dir = paste0(temp_dir, "/", id))
})
```
:::
## Runtime parameters in `par`
The `par` object (or `par_` environment variables in Bash) will contain argument values passed at runtime. For example, passing `--input foo.txt` will result in a `par["input"]` being equal to `"foo.txt"`.
:::{.callout-tip}
Try adding more [arguments]({{< var reference.arguments >}}) with different types to see what effect this has on the resulting placeholder.
:::
### Special values: undefined and missing items
When calling a Viash component, you can use special values to explicitly set arguments to undefined or represent missing items in multi-value arguments:
* **Unsetting a single-value argument**: Pass the literal `UNDEFINED` (unquoted) to set a single-value argument to undefined/null:
```bash
./my_component --arg UNDEFINED
```
In the script, `par["arg"]` will be `None` (Python), `NULL` (R), `null` (JavaScript), or unset (Bash).
* **Missing items in multi-value arguments**: When passing multiple values via semicolon-separated syntax, use `UNDEFINED_ITEM` to represent a missing element:
```bash
./my_component --values "item1;UNDEFINED_ITEM;item3"
```
In the script, `par["values"]` will be `["item1", None, "item3"]` (Python) or equivalent.
* **Passing the literal string "UNDEFINED"**: Quote the value to pass it as a literal string:
```bash
./my_component --arg '"UNDEFINED"' # par["arg"] = "UNDEFINED"
./my_component --arg "'UNDEFINED'" # par["arg"] = "UNDEFINED"
```
## Meta variables in `meta`
Meta-variables offer information on the runtime environment which you can use from within your script.
* `cpus` (integer): The maximum number of (logical) cpus a component is allowed to use. By default, this value will be undefined.
* `config` (string): Path to the processed Viash config YAML. This file is usually called `.config.vsh.yaml` and resides next to the wrapped executable (see below). This YAML file is useful for doing some runtime introspection of the component for writing generic unit tests.
* `executable` (string): The executable being used at runtime; that is, the wrapped script. This variable is used in unit tests.
* `name` (string): The name of the component, useful for logging.
* `functionality_name` (string): The name of the component, useful for logging. (Deprecated)
* `memory_*` (long): The maximum amount of memory a component is allowed to allocate. The following denominations are provided: `memory_b`, `memory_kb`, `memory_mb`, `memory_gb`, `memory_tb`, `memory_pb` for SI units (1000-base). `memory_kib`, `memory_mib`, `memory_gib`, `memory_tib`, `memory_pib` for IEC units (1024-base).. By default, this value will be undefined.
* `resources_dir` (string): Path to where the resources are stored.
* `temp_dir` (string): A temporary directory in which your script is allowed to create new temporary files / directories. By default, this will be set to the `VIASH_TEMP` environment variable. When the `VIASH_TEMP` variable is undefined, POSIX `TMPDIR` or `/tmp` is used instead.
### `cpus` (integer)
This field specifies the maximum number of (logical) cpus a component is allowed to use. This is useful when parallellizing your component in such a way that integrates very nicely with pipeline frameworks such as Nextflow. Below is an example usage of the `cpus` meta-variable.
::: {.panel-tabset}
## Bash
```bash
#!/bin/bash
## VIASH START
par_input="path/to/file.txt"
par_output="output.txt"
meta_cpus=10
## VIASH END
# Pass number of cores to the popular_software_tool. Set the default to 1.
./popular_software_tool --ncores ${meta_cpus:-1}
```
## C\#
No example available yet.
## JavaScript
No example available yet.
## Python
```python
from multiprocessing import Pool
## VIASH START
par = {}
meta = {"cpus": 1}
## VIASH END
def my_fun(x):
return x + "!"
my_data = ["hello", "world"]
with Pool(processes=meta.get("cpus", 1)) as pool:
out = pool.map(my_fun, my_data)
```
## R
```r
library(furrr)
## VIASH START
par <- list()
meta <- list(
cpus = 1L
)
## VIASH END
if (is.null(meta$cpus)) meta$cpus <- 1
plan(multisession, workers = meta$cpus)
my_data <- c("hello", "world")
out = future_map(
my_data,
function(x) {
paste0(x, "!")
}
)
```
## Scala
```scala
import scala.collection.parallel._
import java.util.concurrent.ForkJoinPool
// VIASH START
// ...
// VIASH END
val pc = mutable.ParArray(1, 2, 3)
val numCores = meta.cores.getOrElse(1)
pc.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(numCores))
pc map { _ + 1 }
```
:::
You can set the number of cores in your component using any of the following approaches:
```bash
# as a parameter of viash run
viash run config.vsh.yaml --cpus 10 -- <my component arguments>
# as a parameter of viash test
viash test config.vsh.yaml --cpus 10
# or as a parameter of the executable
viash build config.vsh.yaml -o output
output/my_executable ---cpus 10
# ↑ notice the triple dash
# to unset a default cpu value, pass UNDEFINED
output/my_executable ---cpus UNDEFINED
```
### `config` (string)
Path to the processed Viash config YAML.
This file is usually called `.config.vsh.yaml` and resides next to the wrapped executable (see below).
This YAML file is useful for doing some runtime introspection of the component for writing generic unit tests.
### `executable` (string)
The executable being used at runtime; that is, the wrapped script. This variable is used in unit tests.
```bash
#!/usr/bin/env bash
set -x
"$meta_executable" --input input.txt > output.txt
[[ ! -f output.txt ]] && echo "Output file could not be found!" && exit 1
cat output.txt
grep -q 'expected output' output.txt
echo Done
```
### `name` (string)
The name of the component, useful for logging.
### `functionality_name` (string)
The name of the component, useful for logging. (Deprecated)
### `memory_*` (long)
The maximum amount of memory a component is allowed to allocate.
The following denominations are provided: `memory_b`, `memory_kb`, `memory_mb`, `memory_gb`, `memory_tb`, `memory_pb`.
By default, this value will be undefined.
You can set the amount of memory in your component using any of the following approaches:
```bash
# as a parameter of viash run
viash run config.vsh.yaml --memory 2GB -- <my component arguments>
# as a parameter of viash test
viash test config.vsh.yaml --memory 2GB
# or as a parameter of the executable
viash build config.vsh.yaml -o output
output/my_executable ---memory 2GB
# ↑ notice the triple dash
# to unset a default memory value, pass UNDEFINED
output/my_executable ---memory UNDEFINED
```
### `resources_dir` (string)
This field specifies the absolute path to where the resources are stored.
During the build phase resources are copied or fetched into this directory so they are ready to be read during execution of the script or test scripts.
### `temp_dir` (string)
A temporary directory in which your script is allowed to create new temporary files / directories.
By default, this will be set to the `VIASH_TEMP` environment variable.
When the `VIASH_TEMP` variable is undefined, the POSIX `TMPDIR` and other common misspellings will be checked and ultimately `/tmp` is used as fallback.