This project processes geospatial data from remote sources into UNDP STAC GOC repo
The following image shows the steps to process the data:
The data is available at the following link: https://eogdata.mines.edu/nighttime_light/
To use this API, you'll need to have:
- GDAL binary installed on your system
- An Azure account with access to UNDP Azure Blob storage container
- Docker installed on your local machine
- rights to spin Azure Container Instances and az command line in case you wish to deploy the pipeline
The pipeline stores it output in Azure Blob so env vars are used to configure it. Additionally, the last two env vars are required when deploying to Azure Container instances
AZURE_STORAGE_CONNECTION_STRING
AZURE_REGISTRY_USERNAME
AZURE_REGISTRY_PASSWORD
cp .env_example .env
Set up correct values for the env var
This is the easiest and preferred way
docker-compose run pipeline python3 -m undpstac_pipeline.cli --help
docker-compose run pipeline python3 -m undpstac_pipeline.cli daily --help
# run the pipeline for a given day
docker-compose run pipeline python3 -m undpstac_pipeline.cli daily -y 2024 -m 2 -d 20 -f
# fetch a message (yyyyMMdd string text) from service bus queue to process a day
docker-compose run pipeline python3 -m undpstac_pipeline.cli queue -h
- Clone this repository to your local machine
- Navigate to the project directory in your terminal or command prompt.
- create a virtual env using pipenv
- install GDAL manually
- install all other deps
git clone https://github.com/UNDP-Data/geo-undpstac-pipeline.git
cd ge-undpstac-pipeline
pipenv --python 3 # this will create a virt env using default python interpreter
pipenv run pip install GDAL=`gdalinfo --version | cut -c 6-10`
pipenv run pip install -r requirements.txt
pipenv run python -m undpstac_pipeline.cli --help
The pipeline can be deployed to ACI using az tool For this the deploy folder contains a bash script and a template file that can be used for this purpose
you need to have az command line setup/configured to be able to deploy!
apiVersion: "2023-05-01"
location: westcentralus
properties:
osType: Linux
restartPolicy: OnFailure
containers:
- name: nighttimelights-test
properties:
environmentVariables:
- name: AZURE_STORAGE_CONNECTION_STRING
value: $AZURE_STORAGE_CONNECTION_STRING
image: undpgeohub.azurecr.io/undp-data/geo-undpstac-pipeline:main
command: ["python", "-m", "undpstac_pipeline.cli", "archive", "-s=2023-09-09", "-e=2023-09-15", "-f"]
#command: ["python", "-m", "undpstac_pipeline.cli", "daily", "-y 2024", "-m 1", "-d 24", "-f"]
#command: ["python", "-m", "undpstac_pipeline.cli", "daily", "-y 2024", "-m 1", "-d 24", "--lonmin=0", "--latmin=-10", "--lonmax=10", "--latmin=10", "-f"]
resources:
requests:
cpu: 4
memoryInGB: 16
imageRegistryCredentials:
- server: undpgeohub.azurecr.io
username: $AZURE_REGISTRY_USERNAME
password: $AZURE_REGISTRY_PASSWORD
tags: {}
type: Microsoft.ContainerInstance/containerGroups
cd deploy
sh deploy_acr.sh
docker-compose run pipeline python3 -m queue_register.cli -h
usage: cli.py [-h] {daily,archive,yesterday,recent} ...
Register queue message to service bus queue
positional arguments:
{daily,archive,yesterday,recent}
main modes of operation
daily Register a day of message into the queue
archive Register a range of days into the queue
yesterday Register yesterday of message into the queue
recent Register recent N days of message into the queue
options:
-h, --help show this help message and exit
- register a day
python -m queue_register.cli daily -t=nighttime -d=2024-01-25
- register a range of days
python -m queue_register.cli archive -t=nighttime -s=2023-01-01 -e=2023-03-31
- register yesterday
python -m queue_register.cli yesterday -t=nighttime
- recent N days
The recent N days until yesterday will be added into the queue
python -m queue_register.cli recent -t=nighttime -n=5
If you wish to push data to overwrite existing data in storage account, use -f
option. queue_register.cli
will push a message with force processing option like nighttime,20240101,force
. As default, force processing is disabled.