-
Notifications
You must be signed in to change notification settings - Fork 0
Create a pipeline
This guide describes the process, reference software architecture and coding requirements for developing an IBF System Data Pipeline (IBF Pipeline).
The goal of this guide is to bring consistency throughout all IBF Pipelines, have an aligned on development process, and make sure that the Pipelines adhere to quality requirements like scalability, modularity, reusability and maintainability.
IBF Pipelines need to adhere to the following general requirements:
- Main programming language should be Python
- One data pipeline per hazard for all countries
- No country, hazard or specific exception in the pipeline code
- All country-, hazard-specific settings should be in configuration file
- Clear documentation and simple approach to add new country via the configuration file in a pipeline
For IBF Pipeline packaging, adhere to the following:
- Use packaging tool for dependency management such as Poetry
- Dockerise the entire package for easy deployment
These are resources to support the operations of a pipeline: data factory and data storage. They connect with the pipeline in different ways:
Data factory proceeds to download from sources raw unstructured hazard indication data and put to the storage account.
The pipeline interacts with these resources by calling relevant Load functions within the pipeline.
- Storage account stores raw unstructured hazard indication data (GloFAS raster, etc), flood extents, exposure and vulnerability (population, etc).
- CosmosDB stores intermediate and final forecast (transformed) data.
Ideally the pipeline repository includes tests, as e.g. for riverine floods.
In addition and/or alternatively, take note of the following manual integration testing guidelines
Important
Testing happens on the ibf-test API. Make sure that any API-code needed to facilitate the pipeline has been deployed to there.
Before releasing a new pipeline or a new feature to an existing pipeline, it is highly vital that integration between pipeline and API is tested well. In the absence of automated tests for this, this is mostly a manual exercise at the moment.
-
Before testing a complete pipeline run, make sure that you follow the outline as described on the API for pipelines completely, both in terms of structure and in terms of details per endpoint. The outline is meant to be complete. If you feel something is missing, contact API maintainers.
-
When asserting the IBF Portal visually, make sure that all dashboard components are coherent w.r.t. each other.
- Chat section, timeline section, and map section should all align.
- Toggle on specific layers in the map where needed, to test e.g. raster layers.
- If an upload consists of multiple events, check all events - and their map layers - separately.
-
When done, check with a IBF Portal developer, who can check the IBF portal. In case of issues, developers can check the database and/or API logs to find the issue. Iterate back and forth based on this, until resolved.
Note
You can first test individual endpoints to make sure that API path, parameters and body format are all correct, by asserting a 201 status code. However, do not expect sensible output in the IBF portal based on 1 endpoint call. The IBF Portal needs a complete pipeline run, as outlined in detail above per disaster-type.
-
Repeat the above steps per relevant scenario, as defined on the API for pipelines per disaster-type.
-
IMPORTANT Also test sequences of scenarios, thereby mimicking consecutive pipeline runs. (Adjust the "date" parameter to simulate consecutive months/days/hours.). Typical sequence of scenarios include:
- Event > no event (This can come with specific event-will-not-occur notification)
- Multi-event > single/less events
- Warning > Trigger (and/or vice versa)
- Ongoing event > no event (This can come with specific event-has-ended notification)
- And more, depending on specific disaster-type.
-
Not only doing testing locally, it is also necessary to set up a dedicate logic app for testing (scenario testing). This allows to initiate a pipeline run via API call with certain parameters, such as country, scenario and date. See more:
- Drought: Github pipeline documentation
- River-flood: Github pipeline documentation