Skip to content

headrockz/airflow

Repository files navigation

Airflow + dbt Project

End-to-end data pipeline using Apache Airflow, dbt, and PostgreSQL for data transformation and orchestration.

Tech Stack

Project Structure

airflow/
├── dags/                   # Airflow DAGs
│   ├── cosmos/             # dbt project
│   │   ├── models/         # dbt models (trusted/gold layers)
│   │   ├── seeds/          # Static CSV data
│   │   ├── macros/         # Reusable dbt functions
│   │   └── tests/          # Data quality tests
│   ├── modules/            # DAG utility modules
│   │   └── *.py            # files utility
│   └── *.py                # DAG files
├── includes/               # Additional modules
├── plugins/                # Airflow plugins
├── tests/                  # Project tests
├── docker-compose.yaml     # Docker services
├── Dockerfile              # Custom Airflow image
└── pyproject.toml          # Python dependencies

Quick Start

  1. Clone and start services:
docker-compose up -d
  1. Access Airflow UI:
  1. Configure PostgreSQL connection in Airflow:
  • Connection ID: postgres
  • Host: postgres
  • Database: airflow
  • User: airflow
  • Password: airflow

Data Pipeline

The project processes Pokemon and financial data through dbt layers:

  • Seeds: Raw CSV data (pokemon, moves, dollar/ibov)
  • Trusted: Clean and standardized data
  • Gold: Aggregated metrics and analysis

Testing

The project includes automated tests for data quality and YAML validation:

# Run all tests
pytest
# Run with coverage
pytest --cov=tests --cov-report=html
# Run specific test
pytest tests/test_yaml.py -v

Dynamic DAG Generation

The project includes a dynamic DAG generator that creates Airflow DAGs from YAML configuration files, eliminating repetitive Python code.

See detailed documentation: Dynamic Generator README

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published