This repository automates the collection and deployment of KBO data using Apache Airflow. It manages the flow of data from collection to visualization in the Data Portal.
This pipeline runs on Apache Airflow and is deployed using Docker Compose. To set up and run the pipeline, ensure Docker is installed and configured properly.
To run the pipeline locally using Docker Compose
-
Clone this repository and initialize submodules
git clone --recurse-submodules https://github.com/leewr9/kbo-data-pipeline.git cd kbo-data-pipeline
Ensure that your GCP service account key is placed in the
config
folder and renamed tokey.json
-
Start the Airflow services using Docker Compose
docker-compose up -d
-
Access the Airflow web UI
- http://localhost:8080/
- Login with Username:
admin
, Password:admin
The parsing modules are managed through the kbo-data-collector repository, which is included as a Git submodule in this project.
This project is licensed under the MIT License. See the LICENSE file for details.