Skip to content

Latest commit

 

History

History
31 lines (23 loc) · 1.26 KB

README.md

File metadata and controls

31 lines (23 loc) · 1.26 KB

KBO Data Pipeline

This repository automates the collection and deployment of KBO data using Apache Airflow. It manages the flow of data from collection to visualization in the Data Portal.

Deployment

This pipeline runs on Apache Airflow and is deployed using Docker Compose. To set up and run the pipeline, ensure Docker is installed and configured properly.

Usage

To run the pipeline locally using Docker Compose

  1. Clone this repository and initialize submodules

    git clone --recurse-submodules https://github.com/leewr9/kbo-data-pipeline.git
    cd kbo-data-pipeline

    Ensure that your GCP service account key is placed in the config folder and renamed to key.json

  2. Start the Airflow services using Docker Compose

    docker-compose up -d
  3. Access the Airflow web UI

Data Collection

The parsing modules are managed through the kbo-data-collector repository, which is included as a Git submodule in this project.

License

This project is licensed under the MIT License. See the LICENSE file for details.