Skip to content

Refactor argument parsing and passing to a json-based approach instead of code injection#762

Open
rcannood wants to merge 150 commits into
mainfrom
switch_to_arrays
Open

Refactor argument parsing and passing to a json-based approach instead of code injection#762
rcannood wants to merge 150 commits into
mainfrom
switch_to_arrays

Conversation

@rcannood
Copy link
Copy Markdown
Contributor

@rcannood rcannood commented Aug 12, 2024

Describe your changes

This PR refactors Viash to use JSON-based parameter passing instead of direct code injection. This makes argument value handling more robust, easier to debug, and correctly handles special characters like backticks, quotes, dollar signs, and newlines.

Related issue(s)

Fixes #375, #619, #705, #763, #821, #840

Changes

  • New io.viash.languages package: Language trait with implementations for Bash, Python, R, JavaScript, Scala, C#, and Nextflow
  • JSON-based parameter passing: Arguments are stored in params.json and parsed at runtime using language-specific JSON parsers (no external dependencies required)
  • Build-time code injection: JSON parsing code is injected at build time, not runtime

Breaking changes

Multiple-value input arguments are now arrays

Before:

# Arguments with multiple: true were semicolon-separated strings
IFS=';' read -ra items <<< "$par_inputs"
for item in "${items[@]}"; do
  echo "$item"
done

After:

# Arguments with multiple: true are now bash arrays
for item in "${par_inputs[@]}"; do
  echo "$item"
done

JSON Structure

{
  "par": {
    "input": "/path/to/input.txt",
    "whole_number": 42,
    "real_number": 3.14,
    "multiple": ["a", "b", "c"]
  },
  "meta": {
    "name": "component_name",
    "resources_dir": "/path/to/resources",
    "executable": "/path/to/executable",
    "config": "/path/to/.config.vsh.yaml",
    "temp_dir": "/tmp",
    "cpus": 2,
    "memory_b": 2000000000
  },
  "dep": {
    "dependency_name": "/path/to/dependency"
  }
}

Reviewing the viash work directory

To inspect the work directory and params.json file during execution:

# Build a test component
bin/viash build src/test/resources/test_languages/bash/config.vsh.yaml -o /tmp/test_component

# Run with VIASH_KEEP_WORK_DIR to preserve the work directory
VIASH_KEEP_WORK_DIR=true /tmp/test_component/test_languages_bash \
  /tmp/test_component/.config.vsh.yaml \
  --real_number 3.14 \
  --whole_number 42 \
  -s "hello world"

# The work directory location is printed; inspect params.json:
cat /tmp/viash_work_test_languages_bash_*/params.json | jq .

New feature: UNDEFINED and UNDEFINED_ITEM

This PR adds proper support for explicitly unsetting argument values:

Unsetting Single-Value Arguments

Pass the literal UNDEFINED (unquoted) to set an argument to null:

./my_component --optional_arg UNDEFINED

Result: par["optional_arg"] is None (Python), NULL (R), null (JS), or unset (Bash).

Missing items in multi-value arguments

Use UNDEFINED_ITEM in semicolon-separated values to represent null elements:

./my_component --values "item1;UNDEFINED_ITEM;item3"

Result: par["values"] is ["item1", None, "item3"] (Python) or equivalent.

Passing literal "UNDEFINED" string

Use nested quotes to pass "UNDEFINED" as a literal string:

./my_component --arg '"UNDEFINED"'    # par["arg"] = "UNDEFINED"
./my_component --arg "'UNDEFINED'"    # par["arg"] = "UNDEFINED"

Unsetting computational requirements

You can also pass UNDEFINED to unset ---cpus or ---memory defaults:

./my_component ---cpus "UNDEFINED" ---memory "UNDEFINED"

Checklist

  • All tests passing (55 component + 16 Nextflow + 190 E2E)
  • Bash 3.2 compatibility verified
  • BSD/macOS compatibility verified
  • Breaking changes documented
  • Changelog updated

@rcannood

This comment was marked as outdated.

@rcannood rcannood mentioned this pull request Feb 10, 2026
8 tasks
* add improved json parser

* improve speed
@tverbeiren
Copy link
Copy Markdown
Member

Updates to mitigate breaking changes:

  • jsonlite is a requirement for R components

    • Write custom parser
    • If installed, use jsonlite else use custom parser (with deprecation warning)
    • Add unit tests for both situation
  • The same for jq in bash

  • bash uses arrays: All components that use bash will break !

    • Add flag for backward compatibility → with deprecation warning
  • We leave js, C# and Scala as is for the time being

@tverbeiren
Copy link
Copy Markdown
Member

Updates to mitigate breaking changes:

* `jsonlite` is a requirement for R components
  
  * [ ]  Write custom parser
  * [ ]  If installed, use `jsonlite` else use custom parser (with deprecation warning)
  * [ ]  Add unit tests for both situation

This is nicely covered, including tests!

* The same for `jq` in `bash`

I don't see how this is covered?

* `bash` uses arrays: All components that use bash will break !
  * Add flag for backward compatibility → with deprecation warning

I don't see where this is covered either?

@hcannoodt
Copy link
Copy Markdown
Collaborator

2 issues remain in tests:

  • the latest nextflow is requiring strict syntax. This is a known issue and not related to this PR
  • the Scala JSON parser fails downloading environment dependencies during testing and ends up being skipped. This test works correctly locally.

Use Scala 3.6 as Scala 3.8 no longer supports Java 11
Start Scala/Bloop engine as a bunch of downloads need to happen and needs some time to wake up preventing possible stuck servers down the road
@hcannoodt hcannoodt requested a review from tverbeiren March 26, 2026 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants