Three small improvements to Airflow's PostgreSQL-compatible database handling — pre-PR sanity check #65453
Unanswered
viragtripathi
asked this question in
Q&A
Replies: 1 comment 1 reply
-
|
Yes. You should start this discussion on Devlist. See https://airflow.apache.org/community/ |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Airflow maintainers,
I'm working on improving Airflow 3.x's behavior on PostgreSQL-compatible databases (currently focused on CockroachDB, which is wire-protocol compatible with PG). Before opening individual PRs I wanted to surface the proposed changes here for a sanity check, since two of the three are dialect-name additions that touch generic code paths.
I'd appreciate maintainer guidance on:
I've read #46175 and want to be explicit up front: I am not asking Airflow to take on a new metadata-backend. The dialect-specific work (SQLAlchemy compiler hooks, retry logic, migration audit) lives entirely outside Airflow — see context below. The three items in this post are improvements that stand on their own merits regardless of whether anyone uses CockroachDB.
Context (skip if uninterested)
Background work that's already done outside this repo:
Integration POC validating Airflow 3.2.0 against CockroachDB v25.4 LTS — airflow db migrate succeeds, DAG parsing/scheduling/task execution all work, example DAGs run end-to-end. 111 Alembic migrations audited, 12 findings, all handled.
sqlalchemy-cockroachdb PR that registers @compiles(timestampdiff, "cockroachdb") so MySQL-style func.timestampdiff(unit, start, end) (the fallback used in airflow-core/src/airflow/models/taskinstance.py and dagrun.py for non-PG/non-SQLite dialects) compiles to a PostgreSQL-style EXTRACT(EPOCH FROM …) expression: cockroachdb/sqlalchemy-cockroachdb#301. This means Airflow needs no code change for the timestampdiff issue once the dialect ships 2.0.4+.
So the items below are the remaining ones that genuinely require small upstream changes.
Proposed PR 1 — Generic 40001 serialization-failure retry in the scheduler
Files: airflow-core/src/airflow/utils/retries.py (decorator fix) and airflow-core/src/airflow/jobs/scheduler_job_runner.py (apply to _critical_section_enqueue_task_instances).
Why this is a generic improvement, not CRDB-specific: PostgreSQL also raises SerializationFailure (SQLSTATE 40001) under high concurrency on SELECT … FOR NO KEY UPDATE … SKIP LOCKED. Today Airflow's scheduler does not retry these — it either crashes the process or enters an error loop. #40882 (closed) noted that the existing retry_db_transaction decorator only issues session.rollback() for OperationalError, not all DBAPIError subclasses, leaving subsequent retries to fail with InFailedSqlTransaction.
Proposed change:
Fix retry_db_transaction to roll back on any DBAPIError, not just OperationalError.
Apply it (or an equivalent retry loop) to _critical_section_enqueue_task_instances.
Use exponential backoff and bound the number of retries.
Risk: Medium. Touches the scheduler hot path. Change should be guarded by tests that fault-inject 40001 and assert progress.
Benefit: Improves PG resilience under concurrent scheduler workloads. Side benefit: makes the scheduler usable on any PG-compatible DB whose isolation model surfaces 40001.
This is the largest of the three changes and the one I most want a steer on before drafting.
Proposed PR 2 — Add cockroachdb to the async-driver mapping
File: airflow-core/src/airflow/settings.py:240
def _get_async_conn_uri_from_sync(sync_uri):
AIO_LIBS_MAPPING = {"sqlite": "aiosqlite", "postgresql": "asyncpg", "mysql": "aiomysql"}
...
Proposed change: Add "cockroachdb": "asyncpg" to the mapping.
Why standalone-justifiable: This function exists to derive an async URI when one isn't explicitly configured. Today, anyone using a cockroachdb:// SQLAlchemy URL (the canonical scheme registered by sqlalchemy-cockroachdb since 2017) hits The asyncio extension requires an async driver. The loaded 'psycopg2' is not async. The workaround is setting AIRFLOW__DATABASE__SQL_ALCHEMY_CONN_ASYNC explicitly. One-line fix; no behavior change for postgres/mysql/sqlite users.
Risk: Trivial. Touches one dict literal.
Proposed PR 3 — Dialect-aware UUID generation in migration 0042
File: airflow-core/src/airflow/migrations/versions/0042_3_0_0_add_uuid_primary_key_to_task_instance_.py
The migration creates a custom uuid_generate_v7() SQL function via CREATE EXTENSION pgcrypto. CockroachDB doesn't expose pgcrypto extensions but offers native gen_random_uuid().
Proposed change: Add a dialect-name check at the top of the upgrade so non-postgres PG-compatible dialects can supply their own UUID generator (defaulting to gen_random_uuid() for cockroachdb). PG users see no change. The fallback uses v4 UUIDs (not v7) on the alternate dialect, which is documented as a tradeoff.
Risk: Low. Migration runs once, dialect-name-guarded, no impact on existing PG installs.
The CRDB-specific compatibility work (compiler hooks, alembic impl, retry middleware) lives in sqlalchemy-cockroachdb. The three items above are the only places where small generic changes in Airflow itself would close the gap meaningfully.
Disclosure
I work at Cockroach Labs. The integration POC and sqlalchemy-cockroachdb#301 are linked above for full transparency. Happy to share any additional context. Thanks for reading.
Beta Was this translation helpful? Give feedback.
All reactions