-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Problem
The current [WES response schema for outputs](
workflow-execution-service-schemas/openapi/workflow_execution_service.openapi.yaml
Lines 824 to 826 in 03c795c
outputs: | |
type: object | |
description: The outputs from the workflow run. |
object
. While this is flexible, it’s also very broad, which risks divergence among client implementations.
I understand the challenge of keeping WES general enough to support all workflow engines, but I believe the specification could benefit from stronger recommendations—or even minimal conventions—for how outputs are represented.
Proposed Output Classification
In practice, I’ve found it useful to categorize outputs into three broad types:
meta
: Files produced by the engine itself, such as cache files, logs, or system information.stage
: Intermediate artifacts needed by the engine or backend to run the workflow (e.g.,.command.run
,.command.sh
, or cache files that downstream steps depend on).results
: The actual outputs of the workflow that are meaningful to the user.
From a user’s perspective, all of these categories matter, but implementations should be smart about filtering out noise (e.g., cache files in meta
likely don’t need to be returned to clients).
Example Structure
Here’s a simplified version of how I’ve modeled outputs in my own use case:
outputs:
results:
relative_path_of_my_results_1:
url: "presigned S3 URL, file path (e.g. file://results/result_1), or other access method"
checksum:
type: sha256
value: "<checksum_value>"
relative_path_of_my_results_2:
url: "presigned S3 URL, file path (e.g. file://results/result_2), or other access method"
checksum:
type: sha256
value: "<checksum_value>"
meta:
relative_path_of_my_meta_1:
url: "presigned S3 URL, file path (e.g. file://meta/meta_1), or other access method"
checksum:
type: sha256
value: "<checksum_value>"
This isn’t the exact implementation, but it conveys the idea.
Performance Considerations
This approach worked well for my use case, but running larger workflows exposed a new problem: returning a large number of results in a single API call can degrade performance.
Possible solutions:
- Separate endpoint – Keep
outputs
lightweight, but add a new endpoint that returns detailed outputs (potentially with streaming support). - Streaming response – Make the existing
outputs
field itself streamable for large payloads.
Suggestion
I propose that the WES spec:
- Recommends (or defines) a minimal schema for categorizing outputs (
meta
,stage
,results
). - Defines how clients should consume outputs (e.g., URLs, checksums).
- Considers large-output workflows by supporting either a separate or streaming endpoint.
This would reduce ambiguity, improve client interoperability, and help implementers balance usability with performance.