Proof-of-concept for using CockroachDB with Apache Airflow 3.x in two ways:
- Metadata Backend — Replace PostgreSQL as Airflow's internal database
- Data Source — Query/write CockroachDB from DAG workflows via a provider package
Status: Proof-of-concept, not a supported product. The integration works end-to-end on Airflow 3.2.1 + CockroachDB v25.4 LTS with the workarounds documented here. The MySQL-style
timestampdiff()workaround was upstreamed incockroachdb/sqlalchemy-cockroachdb#301and ships insqlalchemy-cockroachdb >= 2.0.4. For the remaining items see Airflow discussion #65453.
Apache Airflow officially supports PostgreSQL, MySQL, and SQLite as metadata backends. CockroachDB is PostgreSQL wire-compatible and offers significant advantages for Airflow deployments:
- Built-in high availability: No need for PostgreSQL replication/failover infrastructure
- Horizontal scalability: Scales writes across nodes for large Airflow deployments
- Multi-region support: CockroachDB's geo-partitioned data enables multi-region Airflow setups
- Simplified infrastructure: One database instead of managing separate PostgreSQL and application databases
This PoC validates both integration paths and provides the foundation for upstream contributions and a self-maintained compatibility layer.
cockroachdb-airflow-poc/
├── README.md
├── LICENSE
├── docker/
│ ├── docker-compose.yml # Full stack: CockroachDB + Airflow
│ └── Dockerfile.airflow # Custom Airflow image with CRDB dialect
├── src/
│ ├── compatibility/ # CockroachDB compatibility layer
│ │ ├── retry_middleware.py # Transaction retry on 40001 errors
│ │ └── migration_utils.py # Audit Airflow migrations for CRDB compat
│ ├── provider/ # Airflow provider package prototype
│ │ ├── hooks/cockroachdb.py # CockroachDB Hook (DbApiHook)
│ │ ├── dialects/cockroachdb.py # CockroachDB Dialect (upsert, introspection)
│ │ └── assets/cockroachdb.py # URI sanitizer
│ └── tests/
│ └── test_cockroachdb_hook.py
├── plugins/ # Airflow plugins dir (bind-mounted; empty by default)
├── scripts/
│ ├── validate-poc.sh # End-to-end PoC validation
│ └── audit-airflow-migrations.sh # Scan migrations for CRDB incompatibilities
└── examples/
└── dags/
├── example_cockroachdb_dag.py
└── example_cockroachdb_health_check.py
- Docker and Docker Compose
- At least 4GB RAM available for containers
- CockroachDB v24.1+ (v25.2+ recommended for LTS support)
Copy the environment template, then optionally override any of the version pins:
cp docker/.env.example docker/.envDefaults (work out of the box):
# docker/.env
COCKROACHDB_VERSION=v25.4.10
AIRFLOW_VERSION=3.2.1
AIRFLOW_PYTHON_VERSION=3.12
SQLALCHEMY_COCKROACHDB_VERSION=2.0.4cd docker
docker compose up -dThis starts:
- CockroachDB (single-node, insecure) on port 26257 (SQL) and 8081 (DB Console)
- Airflow 3.x (API server + scheduler + dag-processor + triggerer) on port 8080
Note on Airflow 3.x architecture: Airflow 3.x separates the webserver into an api-server, and requires a dedicated dag-processor for DAG file parsing. Both are included in this stack.
The init process automatically:
- Creates the
airflowdatabase in CockroachDB - Configures
serial_normalization = sql_sequencefor PostgreSQL compatibility - Enables READ COMMITTED isolation (avoids scheduler crashes on 40001 errors)
- Creates
uuid_generate_v7()compatibility function - Runs
airflow db migrateagainst CockroachDB (retries on DDL visibility race) - Configures admin user via SimpleAuthManager (Airflow 3.x)
timestampdiff() no longer needs an Airflow-side workaround — sqlalchemy-cockroachdb >= 2.0.4 compiles it natively.
| Service | URL | Credentials |
|---|---|---|
| Airflow Web UI | http://localhost:8080 | admin / (see api-server logs) |
| CockroachDB Console | http://localhost:8081 | N/A |
./scripts/validate-poc.shIn the Airflow UI, enable and trigger the cockroachdb_demo DAG.
| Setting | Value | Purpose |
|---|---|---|
serial_normalization |
sql_sequence |
Match PostgreSQL SERIAL behavior |
| Isolation level | READ COMMITTED |
Avoid 40001 scheduler crashes (SERIALIZABLE causes WriteTooOldError under contention) |
uuid_generate_v7() UDF |
gen_random_uuid() wrapper |
Airflow 3.x migration compat |
SQL_ALCHEMY_CONN_ASYNC |
cockroachdb+asyncpg:// |
Airflow 3.x requires async engine (auto-derivation doesn't support cockroachdb:// scheme) |
timestampdiff() |
Compiled natively by sqlalchemy-cockroachdb >= 2.0.4 |
Translates to TRUNC(CAST(EXTRACT(EPOCH FROM ...) AS NUMERIC) <factor>), matching MySQL's integer-truncation semantics |
| Setting | Value | Purpose |
|---|---|---|
| Connection URI | postgresql:// or cockroachdb:// |
CockroachDB accepts both via wire compatibility |
| Connection type | postgres in Airflow UI |
Use existing PostgreSQL connection type |
| Transaction retry | Built into Hook | retry_middleware.py handles 40001 with exponential backoff |
- ✅
airflow db migrate— 68 tables created successfully - ✅ DAG parsing and scheduling
- ✅ Task execution (LocalExecutor) with correct output
- ✅ Airflow UI (API server, DAG grid)
- ✅ Example DAGs with CockroachDB CRUD + distributed SQL
- ✅ MySQL-style
timestampdiff()(compiled natively bysqlalchemy-cockroachdb >= 2.0.4) - ✅ CockroachDB provider package (Hook, Dialect, Asset URI)
⚠️ Scheduler crashes under write contention (40001 errors) — READ COMMITTED reduces but doesn't eliminate. Proper fix requires transaction retry logic in Airflow's scheduler⚠️ Advisory locks (pg_advisory_lock) not available — affects multi-scheduler HA coordination efficiency. Single-scheduler deployments are unaffected
Scan Airflow's Alembic migrations for CockroachDB-incompatible patterns:
./scripts/audit-airflow-migrations.sh 3.2.1cd src
python -m pytest tests/ -v| Approach | Description | Effort | Risk |
|---|---|---|---|
| A: Upstream PR | CockroachDB as a supported Airflow metadata backend | High | Medium-High |
| B: Provider Package | CockroachDB as a data source in DAGs | Low | Low |
Recommended path: A + B in parallel. The narrowly-scoped Airflow changes for Path A are being discussed upstream (see Next Steps); the data-source provider for Path B is prototyped here.
Run the PoC✅ —airflow db migratesucceeds, DAGs execute correctlyAudit migrations✅ — 111 files audited, 12 findings, all handledSubmit sqlalchemy-cockroachdb PR✅ — PR #301 merged;@compiles(timestampdiff, "cockroachdb")ships insqlalchemy-cockroachdb >= 2.0.4Open Airflow GitHub Discussion✅ — #65453; maintainer guidance was to take it to the devlist firstSend✅ — sent 2026-04-18; thread at lists.apache.org/thread/t6jo4th3sn23jmr34m6gcxzw4k8mo4pc. Awaiting maintainer feedback before drafting the three PRs.[DISCUSS]todev@airflow.apache.org- Submit Airflow PRs after devlist signal — three narrow PRs framed as PostgreSQL-compatibility improvements
- Track CockroachDB advisory-lock support — would enable scheduler HA coordination without retry-loop overhead
- CockroachDB PostgreSQL Compatibility
- Airflow Database Backend Docs
- sqlalchemy-cockroachdb
- CockroachDB Issue #32537 — Airflow ORM Support
- Airflow Issue #14438 — CockroachDB DDL Issues
- Airflow Issue #46175 — New DB Backend Request (Rejected)
- Airflow Issue #40882 — Retry Logic Bugs
- Airflow Custom Provider Guide