Skip to content

Enum with colon symbols cause URI parsing error  #2071

Open
@fmigneault

Description

@fmigneault

Expected Behavior

The part of interest as shown below (same thing happens whether encoded as JSON or YAML), should not cause any error.

  - type:
      type: enum
      symbols:
        - 00:00
        - 01:00
        - 02:00
        - 03:00
        - 04:00
        - 05:00
        - 06:00
        - 07:00
        - 08:00
        - 09:00
        - 10:00
        - 11:00
        - 12:00
        - 13:00
        - 14:00
        - 15:00
        - 16:00
        - 17:00
        - 18:00
        - 19:00
        - 20:00
        - 21:00
        - 22:00
        - 23:00
    id: time

Actual Behavior

The following is raised.

[...]
URI prefix '20' of '20:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '20' of '20:00' not recognized, are you missing a $namespaces section?
URI prefix '21' of '21:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '21' of '21:00' not recognized, are you missing a $namespaces section?
URI prefix '22' of '22:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '22' of '22:00' not recognized, are you missing a $namespaces section?
URI prefix '23' of '23:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '23' of '23:00' not recognized, are you missing a $namespaces section?
INFO package.yml:1:1: Unknown hint https://schemas.crim.ca/cwl/weaver#OGCAPIRequirement
ERROR Tool definition failed validation:
package.yml:6:1: Type property {'type': 'enum', 'symbols': ['00', '00', '00', '00', '00', '00',
                 '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',
                 '00', '00', '00', '00', '00'], 'name': 'timece7ef3bf-8a50-4818-9473-0f36fe26fcfa'}
                 not a valid Avro schema: Duplicate symbol: ['00', '00', '00', '00', '00', '00',
                 '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',
                 '00', '00', '00', '00', '00']

Somehow, when the loading operation reaches this step:

workflowobj = cast(
CommentedMap,
loadingContext.loader.fetch(fileuri, content_types=CWL_CONTENT_TYPES),
)

Schema-Salad does in-memory resolution operation that attempts parsing each symbol to inject the relevant input-id prefix URI. This causes the HH:MM values to be converted as follows with an invalid assumption that the : represents a namespace reference (as if cwl:something was used).

{4643AB99-BCF5-4D26-933B-7DDC33C32EB4}

Literal strings under the enum containing : should not be mishandled this way. Users should not have to work around the tool to inject some escape mechanism. cwltool should handle this transparently.

Workflow Code

cwlVersion: v1.0
class: CommandLineTool
hints:
  weaver:OGCAPIRequirement:
    process: https://cds.climate.copernicus.eu/api/retrieve/v1/processes/reanalysis-era5-land-monthly-means
inputs:
  - type:
      type: enum
      symbols:
        - monthly_averaged_reanalysis
        - monthly_averaged_reanalysis_by_hour_of_day
    id: product_type
  - type:
      type: enum
      symbols:
        - 10m_u_component_of_wind
        - 10m_v_component_of_wind
        - 2m_dewpoint_temperature
        - 2m_temperature
        - evaporation_from_bare_soil
        - evaporation_from_open_water_surfaces_excluding_oceans
        - evaporation_from_the_top_of_canopy
        - evaporation_from_vegetation_transpiration
        - forecast_albedo
        - lake_bottom_temperature
        - lake_ice_depth
        - lake_ice_temperature
        - lake_mix_layer_depth
        - lake_mix_layer_temperature
        - lake_shape_factor
        - lake_total_layer_temperature
        - leaf_area_index_high_vegetation
        - leaf_area_index_low_vegetation
        - potential_evaporation
        - runoff
        - skin_reservoir_content
        - skin_temperature
        - snow_albedo
        - snow_cover
        - snow_density
        - snow_depth
        - snow_depth_water_equivalent
        - snow_evaporation
        - snowfall
        - snowmelt
        - soil_temperature_level_1
        - soil_temperature_level_2
        - soil_temperature_level_3
        - soil_temperature_level_4
        - sub_surface_runoff
        - surface_latent_heat_flux
        - surface_net_solar_radiation
        - surface_net_thermal_radiation
        - surface_pressure
        - surface_runoff
        - surface_sensible_heat_flux
        - surface_solar_radiation_downwards
        - surface_thermal_radiation_downwards
        - temperature_of_snow_layer
        - total_evaporation
        - total_precipitation
        - volumetric_soil_water_layer_1
        - volumetric_soil_water_layer_2
        - volumetric_soil_water_layer_3
        - volumetric_soil_water_layer_4
    id: variable
  - type:
      type: enum
      symbols:
        - "1950"
        - "1951"
        - "1952"
        - "1953"
        - "1954"
        - "1955"
        - "1956"
        - "1957"
        - "1958"
        - "1959"
        - "1960"
        - "1961"
        - "1962"
        - "1963"
        - "1964"
        - "1965"
        - "1966"
        - "1967"
        - "1968"
        - "1969"
        - "1970"
        - "1971"
        - "1972"
        - "1973"
        - "1974"
        - "1975"
        - "1976"
        - "1977"
        - "1978"
        - "1979"
        - "1980"
        - "1981"
        - "1982"
        - "1983"
        - "1984"
        - "1985"
        - "1986"
        - "1987"
        - "1988"
        - "1989"
        - "1990"
        - "1991"
        - "1992"
        - "1993"
        - "1994"
        - "1995"
        - "1996"
        - "1997"
        - "1998"
        - "1999"
        - "2000"
        - "2001"
        - "2002"
        - "2003"
        - "2004"
        - "2005"
        - "2006"
        - "2007"
        - "2008"
        - "2009"
        - "2010"
        - "2011"
        - "2012"
        - "2013"
        - "2014"
        - "2015"
        - "2016"
        - "2017"
        - "2018"
        - "2019"
        - "2020"
        - "2021"
        - "2022"
        - "2023"
        - "2024"
    id: year
  - type:
      type: enum
      symbols:
        - "01"
        - "02"
        - "03"
        - "04"
        - "05"
        - "06"
        - "07"
        - "08"
        - "09"
        - "10"
        - "11"
        - "12"
    id: month
  - type:
      type: enum
      symbols:
        - 00:00
        - 01:00
        - 02:00
        - 03:00
        - 04:00
        - 05:00
        - 06:00
        - 07:00
        - 08:00
        - 09:00
        - 10:00
        - 11:00
        - 12:00
        - 13:00
        - 14:00
        - 15:00
        - 16:00
        - 17:00
        - 18:00
        - 19:00
        - 20:00
        - 21:00
        - 22:00
        - 23:00
    id: time
  - type:
      type: array
      items: float
    id: area
  - type:
      type: enum
      symbols:
        - grib
        - netcdf
    default:
      - grib
    id: data_format
  - type:
      type: enum
      symbols:
        - zip
        - unarchived
    default:
      - unarchived
    id: download_format
outputs:
  - type: File
    format: iana:application/json
    outputBinding:
      glob: '*.json'
    id: asset
$namespaces:
  iana: https://www.iana.org/assignments/media-types/
  weaver: https://schemas.crim.ca/cwl/weaver#

Full Traceback

[2024-11-20 19:25:58,696] ERROR    [MainThread][weaver.processes.wps_package] ../../../../tmp/tmpczjpv51r/package:1:202: Type property {'type': 'enum', 'symbols': ['00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00'], 'name':
                                           'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid
                                           Avro schema: Duplicate symbol: ['00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00']
Traceback (most recent call last):
  File "schema_salad/avro/schema.py", line 307, in __init__
  File "schema_salad/avro/schema.py", line 727, in make_avsc_object
  File "schema_salad/avro/schema.py", line 375, in __init__
schema_salad.avro.schema.AvroException: Duplicate symbol: ['00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/process.py", line 667, in __init__
    make_avsc_object(convert_to_dict(self.inputs_record_schema), self.names)
  File "schema_salad/avro/schema.py", line 735, in make_avsc_object
  File "schema_salad/avro/schema.py", line 656, in __init__
  File "schema_salad/avro/schema.py", line 627, in make_field_objects
  File "schema_salad/avro/schema.py", line 309, in __init__
schema_salad.avro.schema.SchemaParseException: Type property {'type': 'enum', 'symbols': ['00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00'], 'name': 'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid Avro schema: Duplicate symbol: ['00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 1240, in try_or_raise_package_error
    return call()
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 1268, in <lambda>
    lambda: _load_package_content(
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 696, in _load_package_content
    package = factory.make(tmp_json_cwl)  # type: CWLFactoryCallable
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/factory.py", line 67, in make
    load = load_tool.load_tool(cwl, self.loading_context)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/load_tool.py", line 621, in load_tool
    return make_tool(uri, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/load_tool.py", line 598, in make_tool
    tool = loadingContext.construct_tool_object(processobj, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/workflow.py", line 48, in default_make_tool
    return command_line_tool.CommandLineTool(toolpath_object, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/command_line_tool.py", line 412, in __init__
    super().__init__(toolpath_object, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/process.py", line 662, in __init__
    with SourceLine(toolpath_object, "inputs", ValidationException, debug):
  File "schema_salad/sourceline.py", line 249, in __exit__
schema_salad.exceptions.ValidationException: ../../../../tmp/tmpczjpv51r/package:1:202: Type property {'type': 'enum', 'symbols': ['00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00'], 'name':
                                           'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid
                                           Avro schema: Duplicate symbol: ['00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00']
[2024-11-20 19:25:58,698] ERROR    [MainThread][weaver.processes.utils] Invalid package/reference definition. Loading generated error: [Invalid package/reference definition. Loading package content generated error: [../../../../tmp/tmpczjpv51r/package:1:202: Type property {'type': 'enum', 'symbols': ['00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00'], 'name':
                                           'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid
                                           Avro schema: Duplicate symbol: ['00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00']].]
Traceback (most recent call last):
  File "schema_salad/avro/schema.py", line 307, in __init__
  File "schema_salad/avro/schema.py", line 727, in make_avsc_object
  File "schema_salad/avro/schema.py", line 375, in __init__
schema_salad.avro.schema.AvroException: Duplicate symbol: ['00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/process.py", line 667, in __init__
    make_avsc_object(convert_to_dict(self.inputs_record_schema), self.names)
  File "schema_salad/avro/schema.py", line 735, in make_avsc_object
  File "schema_salad/avro/schema.py", line 656, in __init__
  File "schema_salad/avro/schema.py", line 627, in make_field_objects
  File "schema_salad/avro/schema.py", line 309, in __init__
schema_salad.avro.schema.SchemaParseException: Type property {'type': 'enum', 'symbols': ['00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00'], 'name': 'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid Avro schema: Duplicate symbol: ['00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 1240, in try_or_raise_package_error
    return call()
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 1268, in <lambda>
    lambda: _load_package_content(
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 696, in _load_package_content
    package = factory.make(tmp_json_cwl)  # type: CWLFactoryCallable
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/factory.py", line 67, in make
    load = load_tool.load_tool(cwl, self.loading_context)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/load_tool.py", line 621, in load_tool
    return make_tool(uri, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/load_tool.py", line 598, in make_tool
    tool = loadingContext.construct_tool_object(processobj, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/workflow.py", line 48, in default_make_tool
    return command_line_tool.CommandLineTool(toolpath_object, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/command_line_tool.py", line 412, in __init__
    super().__init__(toolpath_object, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/process.py", line 662, in __init__
    with SourceLine(toolpath_object, "inputs", ValidationException, debug):
  File "schema_salad/sourceline.py", line 249, in __exit__
schema_salad.exceptions.ValidationException: ../../../../tmp/tmpczjpv51r/package:1:202: Type property {'type': 'enum', 'symbols': ['00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00'], 'name':
                                           'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid
                                           Avro schema: Duplicate symbol: ['00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/francis/dev/weaver/weaver/processes/utils.py", line 286, in _validate_deploy_process_info
    info = get_process_definition(
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 1267, in get_process_definition
    package_factory, process_type, _ = try_or_raise_package_error(
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 1247, in try_or_raise_package_error
    raise exc_type(f"Invalid package/reference definition. {reason} generated error: [{exc!s}].")
weaver.exceptions.PackageRegistrationError: Invalid package/reference definition. Loading package content generated error: [../../../../tmp/tmpczjpv51r/package:1:202: Type property {'type': 'enum', 'symbols': ['00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00'], 'name':
                                           'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid
                                           Avro schema: Duplicate symbol: ['00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00']].
[2024-11-20 19:25:58,699] DEBUG    [MainThread][weaver.tweens] http exception -> ows exception response.
[2024-11-20 19:25:58,699] WARNING  [MainThread][weaver.tweens] Handled request exception:
  Cause: [POST http://localhost:4002/processes]
  Error: [(HTTPUnprocessableEntity) <422> Invalid package/reference definition. Loading generated error: [Invalid package/reference definition. Loading package content generated error: [../../../../tmp/tmpczjpv51r/package:1:202: Type property {'type': 'enum', 'symbols': ['00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00', '00', '00'], 'name':
                                           'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid
                                           Avro schema: Duplicate symbol: ['00', '00', '00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00']].]]
[2024-11-20 19:25:58,700] DEBUG    [MainThread][weaver.tweens] Handled request details:
(HTTPUnprocessableEntity) <422> Invalid package/reference definition. Loading generated error: [Invalid package/reference definition. Loading package content generated error: [../../../../tmp/tmpczjpv51r/package:1:202: Type property {'type': 'enum', 'symbols': ['00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00', '00', '00'], 'name':
                                           'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid
                                           Avro schema: Duplicate symbol: ['00', '00', '00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00']].]

Your Environment

  • cwltool version: main branch

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions