Skip to content

Workflow engines fail to run workflows referenced by certain TRS URLs #247

@svonworl

Description

@svonworl

As detailed in dockstore/dockstore#5594, some workflow engines, including miniwdl and Cromwell, fail to run a valid Dockstore workflow when:

  • the workflow is referenced via the TRS URL of its primary descriptor.
  • the workflow imports files with paths that contain parent directory references (..).

For example, the following invocation fails:

miniwdl run 'https://dockstore.org/api/ga4gh/trs/v2/tools/%23workflow%2Fgithub.com%2Fbroadinstitute%2Fwarp%2FWholeGenomeGermlineSingleSample/versions/tw_GL-2036_create_rtools_docker/PLAIN_WDL/descriptor'
(https://dockstore.org/api/ga4gh/trs/v2/tools/%23workflow%2Fgithub.com%2Fbroadinstitute%2Fwarp%2FWholeGenomeGermlineSingleSample/versions/tw_GL-2036_create_rtools_docker/PLAIN_WDL/descriptor Ln 31 Col 1) Failed to import ../../../../../../tasks/broad/UnmappedBamToAlignedBam.wdl
HTTP Error 404: Not Found

Turns out the problem is bigger - the workflow engines also fail to run workflows that import files with relative paths.

The root cause is that, when the engines calculate the URL of an import, they interpret the specified TRS URL as a file path. However, a TRS URL doesn't represent a file path, so the engines miscalculate the import URLs and fail when they attempt to load them.

For example, given the TRS URL

https://dockstore.org/api/ga4gh/trs/v2/tools/%23workflow%2Fgithub.com%2Fbroadinstitute%2Fwarp%2FWholeGenomeGermlineSingleSample/versions/tw_GL-2036_create_rtools_docker/PLAIN_WDL/descriptor

and an import referenced by a relative path

../../../../../../tasks/broad/UnmappedBamToAlignedBam.wdl

The engines calculate the import URL, by applying typical file resolution semantics, as:

https://dockstore.org/api/ga4gh/tasks/broad/UnmappedBamToAlignedBam.wdl

The above URL is a corrupt TRS URL, because parts of the original TRS URL have been deleted. During the import URL calculation, the engine drops the trailing descriptor portion of the TRS URL because it looks like a filename, and when the engine normalizes the URL prior to the request, it collapses the parent directory references and more of the original TRS URL is deleted.

Per the TRS spec, a relative path can be appended to the TRS primary descriptor URL, and it will resolve the file relative to the primary descriptor and return its contents. So, the correct URL is:

https://dockstore.org/api/ga4gh/trs/v2/tools/%23workflow%2Fgithub.com%2Fbroadinstitute%2Fwarp%2FWholeGenomeGermlineSingleSample/versions/tw_GL-2036_create_rtools_docker/PLAIN_WDL/descriptor/../../../../../../tasks/broad/UnmappedBamToAlignedBam.wdl

Note that when miniwdl is run with a URL that references the raw github files, it works as expected:

miniwdl run 'https://raw.githubusercontent.com/broadinstitute/warp/tw_GL-2036_create_rtools_docker/pipelines/broad/dna_seq/germline/single_sample/wgs/WholeGenomeGermlineSingleSample.wdl'

Why does it work? The github URL ends with the absolute path of the workflow file, allowing the import urls to be resolved using typical file resolution semantics.

This issue should probably be addressed in the next major TRS revision.

In lieu of that, here are some possible solutions that would help the engines to correctly run a workflow referenced by a "bare" TRS primary descriptor URL:

  • Change the engines' import resolution code so it doesn't modify the specified workflow URL during file resolution calculations, but instead appends the calculated path to it. This could be something the engine does conditionally, perhaps either when it detects a TRS URL or when instructed via a flag.
  • Modify the TRS endpoint to redirect to a URL that ends with the absolute path of the primary descriptor (and responds with the raw file content like the original TRS endpoint). At startup, an engine can check if the specified workflow URL redirects, and if so, use the "target" URL for all subsequent import URL calculations.

┆Issue is synchronized with this Jira Story
┆Project Name: Zzz-ARCHIVE GA4GH tool-registry-service
┆Issue Number: TRS-70

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions