Create a pipeline

This guide describes the process, reference software architecture and coding requirements for developing an IBF System Data Pipeline (IBF Pipeline).

The goal of this guide is to bring consistency throughout all IBF Pipelines, have an aligned on development process, and make sure that the Pipelines adhere to quality requirements like scalability, modularity, reusability and maintainability.

Main programming language should be Python
One data pipeline per hazard for all countries
No country, hazard or specific exception in the pipeline code
All country-, hazard-specific settings should be in configuration file
Clear documentation and simple approach to add new country via the configuration file in a pipeline

Packaging

For IBF Pipeline packaging, adhere to the following:

Use packaging tool for dependency management such as Poetry
Dockerise the entire package for easy deployment

Resources

These are resources to support the operations of a pipeline: data factory and data storage. They connect with the pipeline in different ways:

Data factory

Data factory proceeds to download from sources raw unstructured hazard indication data and put to the storage account.

Data storage

The pipeline interacts with these resources by calling relevant Load functions within the pipeline.

Storage account stores raw unstructured hazard indication data (GloFAS raster, etc), flood extents, exposure and vulnerability (population, etc).
CosmosDB stores intermediate and final forecast (transformed) data.

Integration testing

Ideally the pipeline repository includes tests, as e.g. for riverine floods.

In addition and/or alternatively, take note of the following manual integration testing guidelines

Manual integration testing guidelines

Important

Testing happens on the ibf-test API. Make sure that any API-code needed to facilitate the pipeline has been deployed to there.

Before releasing a new pipeline or a new feature to an existing pipeline, it is highly vital that integration between pipeline and API is tested well. In the absence of automated tests for this, this is mostly a manual exercise at the moment.

Before testing a complete pipeline run, make sure that you follow the outline as described on the API for pipelines completely, both in terms of structure and in terms of details per endpoint. The outline is meant to be complete. If you feel something is missing, contact API maintainers.
When asserting the IBF Portal visually, make sure that all dashboard components are coherent w.r.t. each other.
- Chat section, timeline section, and map section should all align.
- Toggle on specific layers in the map where needed, to test e.g. raster layers.
- If an upload consists of multiple events, check all events - and their map layers - separately.
When done, check with a IBF Portal developer, who can check the IBF portal. In case of issues, developers can check the database and/or API logs to find the issue. Iterate back and forth based on this, until resolved.

Note

You can first test individual endpoints to make sure that API path, parameters and body format are all correct, by asserting a 201 status code. However, do not expect sensible output in the IBF portal based on 1 endpoint call. The IBF Portal needs a complete pipeline run, as outlined in detail above per disaster-type.

Repeat the above steps per relevant scenario, as defined on the API for pipelines per disaster-type.
IMPORTANT Also test sequences of scenarios, thereby mimicking consecutive pipeline runs. (Adjust the "date" parameter to simulate consecutive months/days/hours.). Typical sequence of scenarios include:
- Event > no event (This can come with specific event-will-not-occur notification)
- Multi-event > single/less events
- Warning > Trigger (and/or vice versa)
- Ongoing event > no event (This can come with specific event-has-ended notification)
- And more, depending on specific disaster-type.
Not only doing testing locally, it is also necessary to set up a dedicate logic app for testing (scenario testing). This allows to initiate a pipeline run via API call with certain parameters, such as country, scenario and date. See more:
- Drought: Github pipeline documentation
- River-flood: Github pipeline documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create a pipeline

Table of Contents

Requirements

Packaging

Resources

Data factory

Data storage

Integration testing

Manual integration testing guidelines

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally