Skip to content

Definition for outputs #228

@JaeAeich

Description

@JaeAeich

Problem

The current [WES response schema for outputs](

outputs:
type: object
description: The outputs from the workflow run.
) defines outputs simply as an object. While this is flexible, it’s also very broad, which risks divergence among client implementations.

I understand the challenge of keeping WES general enough to support all workflow engines, but I believe the specification could benefit from stronger recommendations—or even minimal conventions—for how outputs are represented.


Proposed Output Classification

In practice, I’ve found it useful to categorize outputs into three broad types:

  • meta: Files produced by the engine itself, such as cache files, logs, or system information.
  • stage: Intermediate artifacts needed by the engine or backend to run the workflow (e.g., .command.run, .command.sh, or cache files that downstream steps depend on).
  • results: The actual outputs of the workflow that are meaningful to the user.

From a user’s perspective, all of these categories matter, but implementations should be smart about filtering out noise (e.g., cache files in meta likely don’t need to be returned to clients).


Example Structure

Here’s a simplified version of how I’ve modeled outputs in my own use case:

outputs:
  results:
    relative_path_of_my_results_1:
      url: "presigned S3 URL, file path (e.g. file://results/result_1), or other access method"
      checksum:
        type: sha256
        value: "<checksum_value>"
    relative_path_of_my_results_2:
      url: "presigned S3 URL, file path (e.g. file://results/result_2), or other access method"
      checksum:
        type: sha256
        value: "<checksum_value>"
  meta:
    relative_path_of_my_meta_1:
      url: "presigned S3 URL, file path (e.g. file://meta/meta_1), or other access method"
      checksum:
        type: sha256
        value: "<checksum_value>"

This isn’t the exact implementation, but it conveys the idea.


Performance Considerations

This approach worked well for my use case, but running larger workflows exposed a new problem: returning a large number of results in a single API call can degrade performance.

Possible solutions:

  1. Separate endpoint – Keep outputs lightweight, but add a new endpoint that returns detailed outputs (potentially with streaming support).
  2. Streaming response – Make the existing outputs field itself streamable for large payloads.

Suggestion

I propose that the WES spec:

  • Recommends (or defines) a minimal schema for categorizing outputs (meta, stage, results).
  • Defines how clients should consume outputs (e.g., URLs, checksums).
  • Considers large-output workflows by supporting either a separate or streaming endpoint.

This would reduce ambiguity, improve client interoperability, and help implementers balance usability with performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions