Create centralised way to publish production versions of data to Azure

Right now, our production data is in the Azure blob storage container `https://popgetter.blob.core.windows.net/popgetter-dagster-test/test_2` and one of us will populate this by setting `ENV=prod` and running all the Dagster pipelines locally :)

I think it's useful to have a single, centralised, way to generate all production data and upload it to another Azure blob storage container (that has a less testy name :-)). There are several benefits of this:

1. Reproducibility — It is clear which data is being uploaded and how it is being generated.
2. Handles top level `countries.txt` file cleanly — The CLI uses this file to determine which countries are present as it cannot traverse the Azure directory structure. Right now the file is being manually generated, which can easily lead to inconsistencies between what it says and the actual data that is tehre
3. Statelessness — The pipeline should wipe the entire blob storage container before re-uploading everything. That way we don't end up with some data updated and others not (which would be bad if e.g. the metadata schema is changed).
4. Continuous deployment — The pipeline can be automatically triggered by new versions/releases on GitHub.

I can throw together a quick Dockerfile for this and maybe investigate running this on GitHub Actions / Azure!

GHA has usage limits (https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration); in particular, "each job in a workflow can run for up to 6 hours of execution time" so it is not a deployment method that will scale well if we have many countries to run. I think for what we have now (BE + NI) it is still workable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create centralised way to publish production versions of data to Azure #123

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Create centralised way to publish production versions of data to Azure #123

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions