Skip to content

Conversation

briandoconnor
Copy link
Contributor

@briandoconnor briandoconnor commented Jul 30, 2025

Overview

This pull request updates the Workflow Execution Service (WES) OpenAPI specification to enhance functionality in several key ways without creating breaking changes. Key changes include cleanup of our documentation, making workflow types and engines consistent in our requests/responses, creating a universal workflow parameterization passing structure (vs. relying on workflow engine specific formats), and creating a universal output format to collect outputs in a structured way regardless of workflow engine. Claude Code was used to generate some of these changes.

eLwazi-hosted GA4GH Hackathon

The eLwazi hosted GA4GH hackathon 7/28-8/1 is working on this issue given the need by various groups attending the session. For more info, see the agenda.

Built Documentation

The human-readable documentation: https://ga4gh.github.io/workflow-execution-service-schemas/preview/feature/issue-176-wes-params/docs/index.html

More detailed description

  • The major goal was to introduce a possible implementation of structured inputs and outputs for WES (issue Input and Output Format Specification #176)
  • We wanted to do this in a non-breaking change… so this was added rather than replacing the current mechanisms (e.g. you can still use workflow_params to send a workflow-specific parameters file, you can still use multipart/form-data to upload workflow file(s) and send params).
  • I added additional_workflow_urls which allows you to send multiple secondary workflow URLs beyond the primary workflow supplied by workflow_url
  • In addition to multipart/form-data added the ability to submit a workflow (as workflow URLs) using application/json and a well-structured request defined in OpenAPI (rather than just strings as used in the multipart/form-data request)
  • Cleaned up service-info, it was missing workflow_type and workflow_engine but these are required values when submitting a new workflow run
  • Cleaned up various documentation
  • Incremented version to 1.2.0
  • Added 'structured_outputs' to GET /runs/{run_id} as a way of structuring outputs from a workflow in a universal way (vs. leaving it undefined as an 'outputs' object)

Issues/questions for discussion

  • For workflow_params, should this be just a string for the multipart? Does it really need to be JSON-encoded? That seems limiting if you're trying to pass YAML to a workflow engine…
  • Do we need "application/json" and "multipart/form-data"? Could we get away with just "multipart/form-data"? The problem with this is the OpenAPI is not well defined since everything is just "string". And it hurts client generation and makes it more error prone to code against this and correctly encode request. So there's value in having a application/json request to cleanly code against the OpenAPI for that. But the cost of supporting two workflow run request types may be too much for implementers.
  • I noticed workflow_engine was missing from the service-info response and the workflow_engines_version array is ambiguous if this WES server supports multiple workflow engines. So I added the former and also updated the docs to say <workflow_engine>_<workflow_engine_version> should be used for the workflow_engine_version array to make it clear when a WES server supports a variety of engines.
  • Same issue for workflow_type and workflow_type_version
  • I feel like both workflow_engine and workflow_type should actually be an object where we can have versions listed under each instead of two separate arrays. But I didn't want to change this since it would be a breaking change.
  • I'm not sure what "system_state_counts" in service-info is for but it's required…
  • Our list of workflow types is "CWL", "WDL", "Nextflow", or "Snakemake" (or another alternative supported by this WES instance, see service-info). Do we want to use lowercase here (which would match supported_filesystem_protocols)?

Specification Updates generated by Copilot:

Version and Logo Updates:

  • Updated API version from 1.1.0 to 1.2.0.
  • Changed logo URL to reflect the latest GA4GH branding.

Workflow Types and Engines:

  • Added support for new workflow types (Nextflow, Snakemake) in addition to CWL and WDL.
  • Introduced workflow_engine and workflow_engine_versions properties to specify supported workflow engines and their versions. [1] [2]

Parameterization Enhancements:

  • Added workflow_unified_params for universal parameter format, enabling generic parameterization across workflow types.
  • Expanded descriptions for workflow_params and workflow_unified_params fields, clarifying their use cases.

Documentation Improvements:

Endpoint Descriptions:

  • Enhanced descriptions for RunWorkflow endpoint, detailing supported content types (application/json, multipart/form-data) and file handling mechanisms.
  • Improved GetServiceInfo endpoint description to include workflow engines and additional service information.

Tags and Groups:

  • Added new tags (workflowoutputs_model, outputobject_model) and grouped them under x-tagGroups for better organization. [1] [2]

Schema Updates:

New Properties:

  • Added workflow_type and workflow_engine objects to the schema for defining supported types and engines. [1] [2]

Clarifications:

  • Updated descriptions for existing schema fields, ensuring consistency and clarity. [1] [2]

These updates make the WES API more versatile, user-friendly, and aligned with emerging standards in workflow execution.

  1. Modified workflow_params description to indicate it's required unless
  workflow_unified_params is provided
  2. Added workflow_unified_params field with the hybrid structure we discussed
  3. Enhanced file metadata support with optional fields for size, checksum, secondary files,
   format, and modification time
  4. Added comprehensive validation constraints for different parameter types
  5. Validated the schema - no OpenAPI validation issues detected

  Key Features of the Implementation:

  - Version field for format evolution (default: 1.0)
  - Rich file metadata (size, checksum, secondary_files, format, last_modified)
  - Comprehensive validation constraints (min/max, length, pattern, enum, array limits)
  - Type-safe parameter definitions with clear enums
  - Backward compatibility - existing workflow_params still works
  - Precedence handling - workflow_unified_params takes precedence when provided
Key Improvements Made:

  1. Dual Content Type Support

  - application/json (Recommended): Uses the proper RunRequest model object
  - multipart/form-data (Legacy): Maintains backward compatibility for file uploads

  2. Proper Model Usage

  - JSON requests now use $ref: '#/components/schemas/RunRequest'
  - Leverages all the rich typing and validation from the RunRequest schema
  - Supports both workflow_params and workflow_unified_params

  3. Enhanced Documentation

  - Clear guidance on when to use each content type
  - Explains file handling differences between formats
  - Documents the new unified parameter format
  - Security considerations for file uploads

  4. Better Developer Experience

  - OpenAPI tooling can generate proper client code for JSON requests
  - Type safety with structured objects instead of string parsing
  - Validation happens automatically with the model schema
  - Consistency across the API

  Usage Examples:

  Preferred JSON format:
  POST /runs
  Content-Type: application/json

  {
    "workflow_type": "CWL",
    "workflow_type_version": "v1.0",
    "workflow_url": "https://example.com/workflow.cwl",
    "workflow_unified_params": {
      "version": "1.0",
      "parameters": {
        "input_file": {
          "type": "File",
          "value": "gs://bucket/input.fastq",
          "file_metadata": {
            "size": 1073741824,
            "checksum": "sha256:abc123..."
          }
        }
      }
    }
  }

  Legacy multipart format (when file uploads needed):
  POST /runs
  Content-Type: multipart/form-data

  workflow_type: CWL
  workflow_unified_params: {"version":"1.0","parameters":{...}}
  workflow_attachment: [binary file data]
  1. Updated RunLog schema - Added structured_outputs field alongside the existing outputs
  2. Added WorkflowOutputs schema - Main container for structured outputs with version and
  metadata
  3. Added OutputObject schema - Flexible output type supporting Files, Directories, Arrays,
  and primitives
  4. Added documentation tags - Both schemas appear in the Models section of the API docs

  Key Features Implemented:

  WorkflowOutputs Schema:

  - Version field for format evolution
  - Named outputs with rich metadata
  - Workflow-level metadata (execution ID, timing, resource usage)
  - Provenance tracking (engine, version, status)

  OutputObject Schema:

  - Type system - File, Directory, Array, String, Integer, Float, Boolean
  - File metadata - location, size, checksum, format, basename
  - Provenance - source task, command, creation time
  - Secondary files - Associated files like indexes
  - Array support - Collections of outputs
  - Content embedding - Small file contents can be included

  Backward Compatibility:

  - Existing outputs field remains unchanged (marked as "legacy format")
  - structured_outputs is optional - implementations can provide either or both
  - No breaking changes to existing API consumers
…ondary workflow URLs beyond the primary workflow
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request introduces structured parameters and outputs to the Workflow Execution Service (WES) API specification while maintaining backward compatibility. The update enhances workflow type and engine support, introduces universal parameterization, and provides structured output collection.

Key changes:

  • Added support for additional workflow types (Nextflow, Snakemake) beyond CWL and WDL
  • Introduced workflow_unified_params for universal parameter format across workflow engines
  • Added structured_outputs to provide rich metadata and type-safe output collection
  • Enhanced service-info endpoint with workflow engine information and improved documentation

@pvanheus
Copy link

pvanheus commented Aug 1, 2025

Just dropping in this from @suecharo / @inutano - Sapporo extended WES in a few ways - https://petstore.swagger.io/?url=https://raw.githubusercontent.com/sapporo-wes/sapporo-service/main/sapporo-wes-spec-2.0.0.yml

One thing that is interesting in their extensions is adding workflow_attachments_obj, which makes it, I think, somewhat easier to specify the destination filename of attachments. Perhaps going even further would be useful and specifying them are File or Directory types.

description: Whether this parameter is optional
default:
description: Default value if parameter not provided (type depends on 'type' field)
constraints:
Copy link
Collaborator

@vinjana vinjana Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about the use of uploading validation constraints when submitting a workflow run.

I totally agree that setting up validation-constraints in a WES makes sense, if the WES is installing a new workflow. However, this implementation has some disadvantages

  1. Consistency: Defining such constraints when requesting a workflow run allows that the client define these constraints differently for different runs, or that different clients (users) use different constraints.
  2. Redundancy: The clients probably would have to provide these parameters constraints every time they submit a run, and it would always be the same parameters constraints, right?
  3. Competencies: Finally, the user of the WES instance is not necessarily the same person (or expert level) as the administrator -- or whoever is responsible for installing workflows. The current implementation puts the load of defining these constraints on the clients, although it should better be put on some kind of an administrator. This is particularly relevant with human data and tight security constraints that may involve workflow auditing, and also is problematic if there are a lot of clients.

These problems may hint at that we are not modelling the domain in sufficient detail.

Some thoughts in this direction

Logically, to me the validation constraints belong with the workflow, not with the workflow run.

Think of a use case like this:

  1. Some client application first requests a new workflow. This could be a a data user but also a WES administrator or somebody involved in governance of the instance. Think of security auditing etc. The workflow installation route could have separate authentication.
  2. The workflow is installed. Dependent on the workflow this may involve a lot of steps, such as downloading containers, running integration tests to verify the installation. (Of course, ideally just some containers would be pulled).
  3. The clients use the workflow in the usual way.

We currently don't have a separate endpoint for requesting the installation of a new workflow, e.g., for downloading from a TRS. I'm also not saying we need it -- there is value in keeping the API simple -- however, I just wanted to make the point, that we could have such an endpoint and that also that would have some valid use case. An installation of a workflow could be considered a "resource" - in the REST sense.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that it seems redundant to pass the parameter type for every workflow run. The parameter types are not changing, so it makes sense to define them with the workflow or even with the engine?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, if you are talking about workflow parameters (e.g., bwa-mem parameters) with the workflow.

If you are talking about workflow engine parameters (e.g., number of cores of the control process or whether to rerun (and complement) a previous run), then with the engine.

In both cases, the problem is analogous. Ideally, a similar solution would be found. I think something like EDAM can be helpful in both cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO constraints are typically properties of a workflow in the case of a WES. Putting their definition on the end user feels needless.

If the user is setting these constraints themselves, why not just modify the inputs to match their constraints before submitting the workflow? Where I have heard the request for constraints come from is workflow authors who want to define things like ranges of values, or enums.

Constraints would probably be better living along side TRS or directly within the workflow itself (if the language supported it)

REQUIRED
The workflow CWL or WDL document. When `workflow_attachments` is used to attach files, the `workflow_url` may be a relative path to one of the attachments.
The primary workflow document. When `workflow_attachments` is used to attach files, the `workflow_url` may be a relative path to one of the attachments that is the primary workflow to be executed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is not limiting the use case in the moment ("may be relative"), but it also does not reflect the whole flexibility of the approach, which at least includes the following:

  • files uploaded as attachment (e.g., file:main.nf)
  • TRS URIs to files downloaded from TRS server (e.g., trs://server:port/path/to/package)
  • files in globally shared (at least for the user group) central workflow installation (e.g., file:/workflows/AwesomeWorkflow/main.nf).

You might want to cover more of use cases here -- also because this YAML is the main documentation, and a more inclusive explanation may help the implementers to understand the standard better.

type: array
items:
type: string
description: An array of one or more acceptable workflow engines. Since a server may support multiple engines and version, the recommendation is to encode the workflow_engine_version array as `<workflow_engine>_<version>` where `workflow_engine` values match this array for clarity.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: An array of one or more acceptable workflow engines. Since a server may support multiple engines and version, the recommendation is to encode the workflow_engine_version array as `<workflow_engine>_<version>` where `workflow_engine` values match this array for clarity.
description: An array of one or more acceptable workflow engines. If the server supports multiple engine versions, encode the versions in `workflow_engine_versions`.
  1. Better not repeat the documentation of workflow_engine_version here, because it distracts from the meaning of workflow_engine and is harder to maintain (DRY principle).
  2. I still think that the reference to workflow_engine_version is useful.

description: Named workflow outputs with structured metadata
additionalProperties:
$ref: '#/components/schemas/OutputObject'
OutputObject:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly, that this is either a single value or an array of OutputObject. So this would be a recursive definition, right? This looks quite powerful. Nice!

description: File integrity hash in format 'algorithm:hash' (e.g., 'sha256:abc123...')
format:
type: string
description: MIME type or format identifier (e.g., EDAM ontology reference)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be useful to directly fix or at least suggest certain aspects of the format. Maybe the best solution is to at least suggest something like URIs for terms, e.g. MIME types (https://www.iana.org/assignments/media-types/media-types.xhtml), and that multiple terms can be used, separated by commata -- or directly make this field a List[String] field.

In the extreme you could define this as of type List[URI].

description: Output value (type depends on class field)
location:
type: string
description: Absolute path or URL to the output file/directory (for File/Directory class)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why absolute path? Why a path at all? Should this not better be a URI? Consider a WES that makes final files available via S3 buckets.

I guess this should NOT include the basename? Or should it?

description:
type: string
description: Human-readable description of the output
secondary_files:
Copy link
Collaborator

@vinjana vinjana Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the WES implementer or administrator will often not be able to decide what is primary or secondary. There may be many workflows and a WES instance may even promiscuously allow to run arbitrary workflows downloaded from a TRS, right? Therefore, to be useful this field will usually rely on the output the workflow.

Therefore, I would question whether this adds much value to the API. At most it will be used to hand through information that is anyways available in some workflow output file. But then the client/data user can access the workflow result files and obtains the information from there.

BTW: There may even be cases, where the ideas of the workflow implementer of what constitutes a primary or secondary file might not concord with what the user thinks.

description: JSON-encoded universal workflow parameters (see RunRequest for how to encode)
workflow_type:
type: string
description: Workflow descriptor type must be "CWL", "WDL", "Nextflow", or "Snakemake" currently (or another alternative supported by this WES instance, see service-info)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRS (from 2.0.1 on ) is using :

  • CWL
  • WDL
  • NFL
  • Galaxy
  • SMK

We should stick to that as well. See discussion #173

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be defined as an enum lower down

ServiceInfo:
title: ServiceInfo
allOf:
- $ref: 'https://raw.githubusercontent.com/ga4gh-discovery/ga4gh-service-info/v1.0.0/service-info.yaml#/components/schemas/Service'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of the fact that a part of service_info is stored somewhere else. Error prone.

Comment on lines +807 to +809
Unified parameter format that can be converted to workflow-language-specific format.
If provided, takes precedence over workflow_params. WES implementations should
convert these to the appropriate native format for the specified workflow_type.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don’t get me wrong—I’m not opposed to using workflow_unified_params. However, its impact is limited to a relatively small subset of parameters, primarily those related to compute resources (e.g., cores, memory). Other parameters, such as repeat, require special handling since their behavior varies depending on the engine.

As @vinjana suggested in our discussion, we might consider introducing a parameter property like wes_version to ensure backward compatibility for these unified parameters.

Copy link
Collaborator

@vinjana vinjana Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is workflow_unified_params not related to workflow_params rather than workflow_engine_params? At least the comments suggest that workflow_params and workflow_unified_params belong together.

But I understand the confusion, because also for workflow engine parameters one may think of standardizing them in a similar way, with ontology terms, etc.

workflow_engine_parameters:
type: object
additionalProperties:
type: string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

description is missing.

JSON-encoded engine-specific parameters

description: JSON-encoded universal workflow parameters (see RunRequest for how to encode)
workflow_type:
type: string
description: Workflow descriptor type must be "CWL", "WDL", "Nextflow", or "Snakemake" currently (or another alternative supported by this WES instance, see service-info)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be defined as an enum lower down

description: JSON-encoded engine-specific parameters (see RunRequest for how to encode)
workflow_url:
type: string
description: The workflow document. When workflow_attachment is used to attach files, the workflow_url may be a relative path to one of the attachments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: The workflow document. When workflow_attachment is used to attach files, the workflow_url may be a relative path to one of the attachments.
description: The path to the workflow document. When workflow_attachment is used to attach files, the workflow_url may be a relative path to one of the attachments.

description: ''
required: false
description: >-
Files to be staged for workflow execution. You set the filename/path using the Content-Disposition header in the multipart form submission. For example 'Content-Disposition: form-data; name="workflow_attachment"; filename="workflows/helper.wdl"'. The files are staged to a temporary directory and can be referenced by relative path in the workflow_url.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking in the RFC for filenames, it does seem like it is strongly encouraged to NOT include folder structures in the filename param.

description: Whether this parameter is optional
default:
description: Default value if parameter not provided (type depends on 'type' field)
constraints:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO constraints are typically properties of a workflow in the case of a WES. Putting their definition on the end user feels needless.

If the user is setting these constraints themselves, why not just modify the inputs to match their constraints before submitting the workflow? Where I have heard the request for constraints come from is workflow authors who want to define things like ranges of values, or enums.

Constraints would probably be better living along side TRS or directly within the workflow itself (if the language supported it)

Unified parameter format that can be converted to workflow-language-specific format.
If provided, takes precedence over workflow_params. WES implementations should
convert these to the appropriate native format for the specified workflow_type.
properties:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably be moved into a separate object: WorkflowUnifiedParams

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants