Skip to content

EO-DataHub/eoap-gen

Repository files navigation

EOAP-GEN

codecov

A CLI tool for generating Earth Observation Application Packages including CWL workflows and Dockerfiles from user supplied python scripts.

Usage

Requirements

  • Python scripts using argparse or click. Parameters are mapped to CWL CommandLineTool inputs
  • Pip requirements file for each script
  • eoap-gen configuration file

Configuration

EOAP-GEN requires a yaml formatted configuration file. This configuration contains certain setting for CWL and especially the flow of data between user inputs, workflow steps and wf output. For more EODH ADES specific guidance see this tutorial.

Example of a single step configuration

id: resize-collection
doc: Resize collection cogs
label: Resize collection cogs
inputs:
  - id: catalog
    label: catalog
    doc: full catalog path
    type: string
    default: supported-datasets/ceda-stac-fastapi
  - id: collection
    label: collection id
    doc: collection id
    type: string
    default: sentinel2_ard
outputs:
  - id: stac_output
    type: Directory
    source: step3/stac_catalog
steps:
  - id: get_urls
    script: playground/get_urls.py
    requirements: playground/get_urls_reqs.txt
    inputs:
      - id: catalog
        source: resize-collection/catalog
      - id: collection
        source: resize-collection/collection
    outputs:
      - id: urls
        type: string[]
        outputBinding:
          loadContents: true
          glob: urls.txt
          outputEval: $(self[0].contents.split('\n'))
      - id: ids
        type: string[]
        outputBinding:
          loadContents: true
          glob: ids.txt
          outputEval: $(self[0].contents.split('\n'))

Configuration Yaml Documentation

Configuration key Description
id Generate workflow with this ID.
doc Workflow documentation string.
label Short human readable label.
inputs List of input definitions for the workflow. Values for these are provided by the user when executing.
inputs[n].id Unique input ID, cannot be the same as ID of another input, output or step. Duplicates between step inputs and workflow inputs are allowed (as seen in the example above), as they are referenced e.g. by <step id>/<step input id>, but this is generally discouraged if avoidable as it can introduce confusion.
inputs[n].label Short human readable label.
inputs[n].doc Documentation string for this object.
inputs[n].type Input type, one of CWL types.
inputs[n].default Default input value, if not provided by the user. Makes the input optional.
outputs List of outputs created by the workflow.
outputs[n].id Unique output ID.
outputs[n].type Output type, one of CWL types.
outputs[n].source Source of the output data. In format <step id>/<step output id>.
steps List of steps to be performed by the workflow.
steps[n].id Unique ID within step object.
steps[n].script Path (relative to the repository root) to python script performing this step.
steps[n].requirements Path (relative to the repository root) to a requirements.txt style file containing python dependencies for the script.
steps[n].inputs List of inputs required by the script
steps[n].inputs[m].id Unique ID within the step, must match parameter name from the script cli. object.
steps[n].inputs[m].source Source of the input data. Steps can consume either workflow inputs or outputs from other steps (this creates dependency between steps). Format can be either <workflow ID>/<wf input ID> or <step ID>/<step output ID>
steps[n].outputs List of step outputs. cwl-gen cannot determine outputs from provided python scripts automatically. Provide valid CWL CommandLineTool outputs section here.

Generator

Using GitHub Actions

The recommended way to use this tool is through a GitHub reusable workflow which generates the application package, builds and pushes docker images to ghcr. See (workflow)[.github/workflows/generate-reusable.yml] to reference available inputs. Workflow needs certain write permissions as it's pushing the generated files to the repo and also pushing container images to the registry.

Example usage:

name: Generate EOAP
on:
  push:
  workflow_dispatch:
jobs:
  build-push:
    uses: EO-DataHub/eoap-gen/.github/workflows/generate-reusable.yml@main
    with:
      config: eoap-gen-config.yml
    permissions:
      contents: write
      packages: write
      attestations: write
      id-token: write

There is also a GitHub composite action available in this repository to only execute eoap-gen. It will generate eoap, the rest is up to the user. See (action)[action.yml] to reference available inputs and default values.

Example usage:

- name: Run eoap-gen
  id: gen
  uses: EO-DataHub/eoap-gen@main
  with:
    config: path/to/config.yml
    output: output-dir

Running locally

Requirements

Running without installation:

pipx run --spec git+https://github.com/EO-DataHub/eoap-gen.git eoap-gen --help

Installing with pipx:

pipx install git+https://github.com/EO-DataHub/eoap-gen
eoap-gen --help

Example usage

eoap-gen generate \
  --config=eoap-gen-config.yml \
  --output=eoap-gen-out \
  --docker-url-base=ghcr.io/user/repo \
  --docker-tag=main

Development

Install poetry

Install package deps:

make install

Run QA checks:

make check

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published