Skip to content

Payu generating invalid default values for metadata #581

@jo-basevi

Description

@jo-basevi

When payu is creating metadata for new experiments (when there is a new uuid), for every field in the Experiment metadata schema that is not defined in the metadata, it adds a key with a end of line comment description of the field. I think the initial idea was to give information of what fields can be added to the metadata.

E.g. Generated metadata.yaml:

experiment_uuid: 0fddbbc5-5496-4a80-aebb-1e0c8f582b87
created: '2025-05-29'
name: mom6_double_gyre-0fddbbc5
model: MOM6
url: [email protected]:TestRepo/mom6_double_gyre.git
contact: Test Name
email: [email protected]
schema_version:  # The version of the schema (string)
description: # Short description of the experiment (string, < 150 char)
long_description: # Long description of the experiment (string)
realm: # The realm(s) included in the experiment (string)
frequency: # The frequency(/ies) included in the experiment (string)
variable: # The variable(s) included in the experiment (string)
nominal_resolution: # The nominal resolution(s) of model(s) used in the experiment (string)
version: # The version of the experiment (number, string)
reference: # Citation or reference information (string)
license: # License of the experiment (string)
parent_experiment: # experiment_uuid for parent experiment if appropriate (string)
related_experiments: # experiment_uuids for any related experiment(s) (string)
notes: # Additional notes (string)
keywords: # Keywords to associated with experiment (string)

The issue is the above metadata.yaml is invalid according the schema, as multiple fields can not be null:

  • realm, frequency, variable, related_experiments, keywords must be an array type.
  • description and long_description (they are also both required fields so it will also fail schema validation for that)
  • schema_version can only be "1-0-3" for that particular schema version.

I want to know if the descriptions are the fields are that useful and if not, should it just be fully removed? Should the metadata.yaml from the above example be:

experiment_uuid: 0fddbbc5-5496-4a80-aebb-1e0c8f582b87
created: '2025-05-29'
name: mom6_double_gyre-0fddbbc5
model: MOM6
url: [email protected]:TestRepo/mom6_double_gyre.git
contact: Test Name
email: [email protected]

The above will still fail schema validation as description and long_description aren't present, so could add values such as description: Add short description of the experiment.

If we want the generated metadata to pass schema validation checks, schema validation should be added at least to the tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions