Skip to content

Lascade-Co/connector

Repository files navigation

Ads & App Analytics ETL Pipeline

This project implements ETL (Extract, Transform, Load) pipelines for Facebook Ads, Google Ads, Google Analytics 4 (GA4), and Google Play Console data, storing it in ClickHouse for analytics and visualization with Metabase.

πŸš€ Quick Start

Prerequisites

  • Docker and Docker Compose
  • Git

Setup

  1. Clone the repository:

    git clone <repository-url>
    cd connector
  2. Create required directories:

    mkdir -p volumes/{postgres,clickhouse,metabase}
  3. Start the core services (PostgreSQL + ClickHouse):

    docker compose up -d

πŸ—οΈ Architecture

  • PostgreSQL: Stores pipeline state and metadata
  • ClickHouse: High-performance data warehouse for ads data
  • Metabase: Business intelligence and data visualization (optional)

πŸ“Š Data Sources

Facebook Ads

  • Schedule: Daily at 1:10 UTC
  • Groups: d1, m4, d2
  • Data: Campaign metrics, ad performance, conversion data

Google Ads

  • Schedule: Daily at 3:10 UTC
  • Groups: d1, m4, d2
  • Data: Campaign performance, ad metrics, budget information

Google Analytics 4 (GA4)

  • Schedule: Daily at 1:30 UTC
  • Groups: d1 (configurable)
  • Data: Traffic sources, user engagement, device analytics, events

Google Play Console

  • Schedule: Daily at 5:10 UTC
  • Groups: Configurable per app
  • Data: Install statistics, crash reports, ratings, user acquisition
  • Source: Google Cloud Storage exports

πŸ› οΈ Development

Running Pipelines

Facebook Ads

python main.py facebook d1  # Run group d1
python main.py facebook m4  # Run group m4
python main.py facebook d2  # Run group d2

Google Ads

python main.py google d1    # Run group d1
python main.py google m4    # Run group m4
python main.py google d2    # Run group d2

Google Analytics 4 (GA4)

python main.py google_analytics d1  # Run group d1

# With backfill (120 days)
GA4_BACKFILL_DAYS=120 python main.py google_analytics d1

Google Play Console

python main.py google_play d1  # Run group d1

# With backfill (6 months)
GOOGLE_PLAY_BACKFILL_MONTHS=6 python main.py google_play d1

Configuration

  1. Secrets Setup:

    • Copy .dlt/secrets.toml and configure your credentials
    • Set up facebook.json, google.json, google_analytics.json, and google_play.json with account configurations
    • GA4 requires a dedicated google_analytics.json file with property IDs and OAuth credentials
    • Configure GitHub Secrets for automated workflows
  2. Pipeline Groups:

    • Each platform has multiple groups (d1, m4, d2)
    • Groups contain different sets of ad accounts or apps
    • Configure in facebook.json / google.json / google_play.json / google_analytics.json

πŸ“ˆ Metabase Integration (Optional)

Starting Metabase

# Start all services including Metabase
docker compose --profile metabase up -d

# Or start only Metabase
docker compose up metabase -d

# Stop Metabase
docker compose stop metabase

Connecting to ClickHouse

  1. Access Metabase at http://localhost:3000
  2. Add ClickHouse as a data source:
    • Host: clickhouse (Docker service name)
    • Port: 8123 (HTTP port)
    • Database: travel
    • Username: traveler
    • Password: EAAJbOELsc3wBO5Rvi9lQyZCVTI

πŸ”„ Automated Workflows

GitHub Actions automatically run the ETL pipelines:

  • Facebook ETL: Daily at 1:10 UTC across all groups
  • GA4 ETL: Daily at 1:30 UTC across all groups
  • Google Ads ETL: Daily at 3:10 UTC across all groups
  • Google Play ETL: Daily at 5:10 UTC (configure as needed)

Workflows can also be triggered manually from the GitHub Actions tab.

Manual Backfill Workflows

  • GA4 Backfill: .github/workflows/ga4-backfill.yml - Pull historical GA4 data
  • Google Ads Backfill: .github/workflows/google-backfill.yml - Pull historical ads data

πŸ“ Project Structure

connector/
β”œβ”€β”€ .github/workflows/          # GitHub Actions workflows
β”‚   β”œβ”€β”€ _reusable-etl.yml      # Shared job template
β”‚   β”œβ”€β”€ backfill.yml           # Manual backfill entry point
β”‚   β”œβ”€β”€ daily-facebook.yml     # Facebook Ads ETL schedule
β”‚   β”œβ”€β”€ daily-ga4.yml          # GA4 ETL schedule
β”‚   β”œβ”€β”€ daily-google-ads.yml   # Google Ads ETL schedule
β”‚   β”œβ”€β”€ daily-google-play.yml  # Google Play ETL schedule
β”‚   └── main.yml               # Default workflow (dispatch)
β”œβ”€β”€ pipelines/                 # ETL pipeline definitions
β”‚   β”œβ”€β”€ facebook/              # Facebook Ads pipeline
β”‚   β”œβ”€β”€ google/                # Google Ads pipeline
β”‚   β”œβ”€β”€ google_analytics/      # GA4 pipeline
β”‚   β”œβ”€β”€ google_play/           # Google Play Console pipeline
β”‚   └── pg/                    # PostgreSQL replication pipeline
β”œβ”€β”€ google_analytics/          # GA4 dlt source (customized)
β”‚   β”œβ”€β”€ README.md              # GA4 setup guide
β”‚   β”œβ”€β”€ helpers/               # GA4 helper utilities
β”‚   └── settings.py            # GA4 configuration defaults
β”œβ”€β”€ facebook_ads/              # Facebook Ads dlt source (customized)
β”œβ”€β”€ google_ads/                # Google Ads helpers and setup scripts
β”œβ”€β”€ pg_replication/            # Logical replication helpers
β”œβ”€β”€ docker-compose.yml         # Local development services
β”œβ”€β”€ main.py                    # Pipeline runner
β”œβ”€β”€ requirements.txt           # Python dependencies
└── utils.py                   # Shared helpers

πŸ”§ Services

PostgreSQL (Port 5432)

  • Database: analytics
  • User: django

ClickHouse (Ports 8123, 9000)

  • Database: travel
  • User: traveler

Metabase (Port 3000) - Optional

🚨 Troubleshooting

Check Service Status

docker compose ps

View Logs

# All services
docker compose logs

# Specific service
docker compose logs clickhouse
docker compose logs metabase

Restart Services

# Restart all
docker compose restart

# Restart specific service
docker compose restart clickhouse

Database Connections

  • PostgreSQL: localhost:5432/analytics
  • ClickHouse HTTP: localhost:8123/travel
  • ClickHouse Native: localhost:9000/travel

πŸ” Security Notes

  • Database passwords are stored in environment variables
  • Production deployments should use proper secret management
  • Metabase should be secured with authentication in production
  • Consider using Docker secrets for sensitive data

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages