Skip to content

Automates KBO data collection and deployment with Airflow.

License

Notifications You must be signed in to change notification settings

leewr9/kbo-data-pipeline

Repository files navigation

KBO Data Pipeline

This repository automates the collection and deployment of KBO data using Apache Airflow. It manages the flow of data from collection to visualization in the Data Portal.

Deployment

This pipeline runs on Apache Airflow and is deployed using Docker Compose. To set up and run the pipeline, ensure Docker is installed and configured properly.

Usage

To run the pipeline locally using Docker Compose

  1. Clone this repository and initialize submodules

    git clone --recurse-submodules https://github.com/leewr9/kbo-data-pipeline.git
    cd kbo-data-pipeline

    Ensure that your GCP service account key is placed in the config folder and renamed to key.json

  2. Start the Airflow services using Docker Compose

    docker-compose up -d
  3. Access the Airflow web UI

Data Collection

The parsing modules are managed through the kbo-data-collector repository, which is included as a Git submodule in this project.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Automates KBO data collection and deployment with Airflow.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages