End-to-end data pipeline using Apache Airflow, dbt, and PostgreSQL for data transformation and orchestration.
- Apache Airflow - Workflow orchestration
- dbt - Data transformation
- Astronomer Cosmos - dbt integration for Airflow
- PostgreSQL - Database
airflow/
├── dags/ # Airflow DAGs
│ ├── cosmos/ # dbt project
│ │ ├── models/ # dbt models (trusted/gold layers)
│ │ ├── seeds/ # Static CSV data
│ │ ├── macros/ # Reusable dbt functions
│ │ └── tests/ # Data quality tests
│ ├── modules/ # DAG utility modules
│ │ └── *.py # files utility
│ └── *.py # DAG files
├── includes/ # Additional modules
├── plugins/ # Airflow plugins
├── tests/ # Project tests
├── docker-compose.yaml # Docker services
├── Dockerfile # Custom Airflow image
└── pyproject.toml # Python dependencies
- Clone and start services:
docker-compose up -d
- Access Airflow UI:
- URL: http://localhost:8080
- User:
airflow
- Password:
airflow
- Configure PostgreSQL connection in Airflow:
- Connection ID:
postgres
- Host:
postgres
- Database:
airflow
- User:
airflow
- Password:
airflow
The project processes Pokemon and financial data through dbt layers:
- Seeds: Raw CSV data (pokemon, moves, dollar/ibov)
- Trusted: Clean and standardized data
- Gold: Aggregated metrics and analysis
The project includes automated tests for data quality and YAML validation:
# Run all tests
pytest
# Run with coverage
pytest --cov=tests --cov-report=html
# Run specific test
pytest tests/test_yaml.py -v
The project includes a dynamic DAG generator that creates Airflow DAGs from YAML configuration files, eliminating repetitive Python code.
See detailed documentation: Dynamic Generator README