This project demonstrates how to build an ELT (Extract, Load, Transform) data pipeline to process 1 million records using Google Cloud Platform (GCP) and Apache Airflow. The pipeline extracts data from Google Cloud Storage (GCS), loads it into BigQuery, and transforms it to create country-specific tables and views for analysis.
- Extract data from GCS in CSV format.
- Load raw data into a staging table in BigQuery.
- Transform data into country-specific tables and reporting views.
- Use Apache Airflow to orchestrate the pipeline.
- Generate clean and structured datasets for analysis.
- Extract: Check for file existence in GCS.
- Load: Load raw CSV data into a BigQuery staging table.
- Transform:
- Create country-specific tables in the transform layer.
- Generate reporting views for each country with filtered insights.
- Staging Layer: Raw data from the CSV file.
- Transform Layer: Cleaned and transformed tables.
- Reporting Layer: Views optimized for analysis and reporting.
- Google Cloud Platform (GCP):
- Google Compute Engine ( for Airflow )
- BigQuery
- Cloud Storage
- Apache Airflow:
- Airflow with Google Cloud providers
- A Google Cloud project with:
- BigQuery and Cloud Storage enabled.
- Service account with required permissions.
- Apache Airflow installed.


