This project implements ETL (Extract, Transform, Load) pipelines for Facebook Ads, Google Ads, Google Analytics 4 (GA4), and Google Play Console data, storing it in ClickHouse for analytics and visualization with Metabase.
- Docker and Docker Compose
- Git
-
Clone the repository:
git clone <repository-url> cd connector
-
Create required directories:
mkdir -p volumes/{postgres,clickhouse,metabase}
-
Start the core services (PostgreSQL + ClickHouse):
docker compose up -d
- PostgreSQL: Stores pipeline state and metadata
- ClickHouse: High-performance data warehouse for ads data
- Metabase: Business intelligence and data visualization (optional)
- Schedule: Daily at 1:10 UTC
- Groups:
d1
,m4
,d2
- Data: Campaign metrics, ad performance, conversion data
- Schedule: Daily at 3:10 UTC
- Groups:
d1
,m4
,d2
- Data: Campaign performance, ad metrics, budget information
- Schedule: Daily at 1:30 UTC
- Groups:
d1
(configurable) - Data: Traffic sources, user engagement, device analytics, events
- Schedule: Daily at 5:10 UTC
- Groups: Configurable per app
- Data: Install statistics, crash reports, ratings, user acquisition
- Source: Google Cloud Storage exports
python main.py facebook d1 # Run group d1
python main.py facebook m4 # Run group m4
python main.py facebook d2 # Run group d2
python main.py google d1 # Run group d1
python main.py google m4 # Run group m4
python main.py google d2 # Run group d2
python main.py google_analytics d1 # Run group d1
# With backfill (120 days)
GA4_BACKFILL_DAYS=120 python main.py google_analytics d1
python main.py google_play d1 # Run group d1
# With backfill (6 months)
GOOGLE_PLAY_BACKFILL_MONTHS=6 python main.py google_play d1
-
Secrets Setup:
- Copy
.dlt/secrets.toml
and configure your credentials - Set up
facebook.json
,google.json
,google_analytics.json
, andgoogle_play.json
with account configurations - GA4 requires a dedicated
google_analytics.json
file with property IDs and OAuth credentials - Configure GitHub Secrets for automated workflows
- Copy
-
Pipeline Groups:
- Each platform has multiple groups (d1, m4, d2)
- Groups contain different sets of ad accounts or apps
- Configure in
facebook.json
/google.json
/google_play.json
/google_analytics.json
# Start all services including Metabase
docker compose --profile metabase up -d
# Or start only Metabase
docker compose up metabase -d
# Stop Metabase
docker compose stop metabase
- Access Metabase at http://localhost:3000
- Add ClickHouse as a data source:
- Host:
clickhouse
(Docker service name) - Port:
8123
(HTTP port) - Database:
travel
- Username:
traveler
- Password:
EAAJbOELsc3wBO5Rvi9lQyZCVTI
- Host:
GitHub Actions automatically run the ETL pipelines:
- Facebook ETL: Daily at 1:10 UTC across all groups
- GA4 ETL: Daily at 1:30 UTC across all groups
- Google Ads ETL: Daily at 3:10 UTC across all groups
- Google Play ETL: Daily at 5:10 UTC (configure as needed)
Workflows can also be triggered manually from the GitHub Actions tab.
- GA4 Backfill:
.github/workflows/ga4-backfill.yml
- Pull historical GA4 data - Google Ads Backfill:
.github/workflows/google-backfill.yml
- Pull historical ads data
connector/
βββ .github/workflows/ # GitHub Actions workflows
β βββ _reusable-etl.yml # Shared job template
β βββ backfill.yml # Manual backfill entry point
β βββ daily-facebook.yml # Facebook Ads ETL schedule
β βββ daily-ga4.yml # GA4 ETL schedule
β βββ daily-google-ads.yml # Google Ads ETL schedule
β βββ daily-google-play.yml # Google Play ETL schedule
β βββ main.yml # Default workflow (dispatch)
βββ pipelines/ # ETL pipeline definitions
β βββ facebook/ # Facebook Ads pipeline
β βββ google/ # Google Ads pipeline
β βββ google_analytics/ # GA4 pipeline
β βββ google_play/ # Google Play Console pipeline
β βββ pg/ # PostgreSQL replication pipeline
βββ google_analytics/ # GA4 dlt source (customized)
β βββ README.md # GA4 setup guide
β βββ helpers/ # GA4 helper utilities
β βββ settings.py # GA4 configuration defaults
βββ facebook_ads/ # Facebook Ads dlt source (customized)
βββ google_ads/ # Google Ads helpers and setup scripts
βββ pg_replication/ # Logical replication helpers
βββ docker-compose.yml # Local development services
βββ main.py # Pipeline runner
βββ requirements.txt # Python dependencies
βββ utils.py # Shared helpers
- Database:
analytics
- User:
django
- Database:
travel
- User:
traveler
- Access: http://localhost:3000
- Data directory:
./volumes/metabase/
docker compose ps
# All services
docker compose logs
# Specific service
docker compose logs clickhouse
docker compose logs metabase
# Restart all
docker compose restart
# Restart specific service
docker compose restart clickhouse
- PostgreSQL:
localhost:5432/analytics
- ClickHouse HTTP:
localhost:8123/travel
- ClickHouse Native:
localhost:9000/travel
- Database passwords are stored in environment variables
- Production deployments should use proper secret management
- Metabase should be secured with authentication in production
- Consider using Docker secrets for sensitive data