-
Notifications
You must be signed in to change notification settings - Fork 0
fix: resolve race condition in compound trigger evaluation #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: qodo_only-issues-20260113-qodo-grep-copilot_base_fix_resolve_race_condition_in_compound_trigger_evaluation_pr281
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,12 +7,46 @@ | |
|
|
||
| from prefect.server.database import PrefectDBInterface, db_injector | ||
| from prefect.server.events.schemas.automations import CompositeTrigger, Firing | ||
| from prefect.server.utilities.database import get_dialect | ||
| from prefect.types._datetime import DateTime, now | ||
|
|
||
| if TYPE_CHECKING: | ||
| from prefect.server.database.orm_models import ORMCompositeTriggerChildFiring | ||
|
|
||
|
|
||
| async def acquire_composite_trigger_lock( | ||
| session: AsyncSession, | ||
| trigger: CompositeTrigger, | ||
| ) -> None: | ||
| """ | ||
| Acquire a transaction-scoped advisory lock for the given composite trigger. | ||
|
|
||
| This serializes concurrent child trigger evaluations for the same compound | ||
| trigger, preventing a race condition where multiple transactions each see | ||
| only their own child firing and neither fires the parent. | ||
|
|
||
| The lock is automatically released when the transaction commits or rolls back. | ||
| """ | ||
| bind = session.get_bind() | ||
| if bind is None: | ||
| return | ||
|
|
||
| # Get the engine from either an Engine or Connection | ||
| engine: sa.Engine = bind if isinstance(bind, sa.Engine) else bind.engine # type: ignore[union-attr] | ||
| dialect = get_dialect(engine) | ||
|
|
||
| if dialect.name == "postgresql": | ||
| # Use the trigger's UUID as the lock key | ||
| # pg_advisory_xact_lock takes a bigint, so we use the UUID's int representation | ||
| # truncated to fit (collision is extremely unlikely and benign) | ||
| lock_key = int(trigger.id) % (2**31) | ||
| await session.execute( | ||
| sa.text("SELECT pg_advisory_xact_lock(:key)"), {"key": lock_key} | ||
| ) | ||
|
Comment on lines
+39
to
+45
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 4. Small advisory lock key • The lock key uses int(uuid) % 2**31 even though Postgres advisory locks accept a 64-bit key. • This increases collision probability, which can cause unnecessary contention between unrelated composite triggers (performance/head-of-line blocking). Agent Prompt
|
||
| # SQLite doesn't support advisory locks, but SQLite also serializes writes | ||
| # at the database level, so the race condition is less likely to occur | ||
|
|
||
|
|
||
| @db_injector | ||
| async def upsert_child_firing( | ||
| db: PrefectDBInterface, | ||
|
|
@@ -102,11 +136,22 @@ async def clear_child_firings( | |
| session: AsyncSession, | ||
| trigger: CompositeTrigger, | ||
| firing_ids: Sequence[UUID], | ||
| ) -> None: | ||
| await session.execute( | ||
| sa.delete(db.CompositeTriggerChildFiring).filter( | ||
| ) -> set[UUID]: | ||
| """ | ||
| Delete the specified child firings and return the IDs that were actually deleted. | ||
|
|
||
| Returns the set of child_firing_ids that were successfully deleted. Callers can | ||
| compare this to the expected firing_ids to detect races and avoid double-firing | ||
| composite triggers. | ||
| """ | ||
| result = await session.execute( | ||
| sa.delete(db.CompositeTriggerChildFiring) | ||
| .filter( | ||
| db.CompositeTriggerChildFiring.automation_id == trigger.automation.id, | ||
| db.CompositeTriggerChildFiring.parent_trigger_id == trigger.id, | ||
| db.CompositeTriggerChildFiring.child_firing_id.in_(firing_ids), | ||
| ) | ||
| .returning(db.CompositeTriggerChildFiring.child_firing_id) | ||
| ) | ||
|
Comment on lines
+147
to
155
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 2. Delete returning on sqlite • clear_child_firings now uses DELETE ... RETURNING unconditionally; Prefect supports SQLite and the codebase already has dialect-specific handling to avoid RETURNING in SQLite code paths. • This can break composite triggers on SQLite deployments/environments where RETURNING is unsupported, causing evaluation errors and preventing parent triggers from firing. Agent Prompt
|
||
|
|
||
| return set(result.scalars().all()) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -35,6 +35,7 @@ | |
| read_automation, | ||
| ) | ||
| from prefect.server.events.models.composite_trigger_child_firing import ( | ||
| acquire_composite_trigger_lock, | ||
| clear_child_firings, | ||
| clear_old_child_firings, | ||
| get_child_firings, | ||
|
|
@@ -346,6 +347,11 @@ async def evaluate_composite_trigger(session: AsyncSession, firing: Firing) -> N | |
| ) | ||
| return | ||
|
|
||
| # Acquire an advisory lock to serialize concurrent evaluations for this | ||
| # compound trigger. This prevents a race condition where multiple child | ||
| # triggers fire concurrently and neither transaction sees both firings. | ||
| await acquire_composite_trigger_lock(session, trigger.automation) | ||
|
|
||
|
Comment on lines
+350
to
+354
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 1. Wrong lock key scope • evaluate_composite_trigger passes trigger.automation into acquire_composite_trigger_lock, even though the helper is documented/typed to lock per composite trigger. • This changes the lock key from the composite trigger UUID to the automation UUID, potentially over-serializing unrelated work and making the helper’s intent/signature misleading. Agent Prompt
|
||
| # If we're only looking within a certain time horizon, remove any older firings that | ||
| # should no longer be considered as satisfying this trigger | ||
| if trigger.within is not None: | ||
|
|
@@ -382,8 +388,27 @@ async def evaluate_composite_trigger(session: AsyncSession, firing: Firing) -> N | |
| }, | ||
| ) | ||
|
|
||
| # clear by firing id | ||
| await clear_child_firings(session, trigger, firing_ids=list(firing_ids)) | ||
| # Clear by firing id, and only proceed if we won the race to claim them. | ||
| # This prevents double-firing when multiple workers evaluate concurrently. | ||
| deleted_ids = await clear_child_firings( | ||
| session, trigger, firing_ids=list(firing_ids) | ||
| ) | ||
|
|
||
| if len(deleted_ids) != len(firing_ids): | ||
| logger.debug( | ||
| "Composite trigger %s skipped fire; expected to delete %s firings, " | ||
| "actually deleted %s (another worker likely claimed them)", | ||
| trigger.id, | ||
| len(firing_ids), | ||
| len(deleted_ids), | ||
| extra={ | ||
| "automation": automation.id, | ||
| "trigger": trigger.id, | ||
| "expected_firing_ids": sorted(str(f) for f in firing_ids), | ||
| "deleted_firing_ids": sorted(str(f) for f in deleted_ids), | ||
| }, | ||
| ) | ||
| return | ||
|
|
||
| await fire( | ||
| session, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3. Lock dialect detection fragile
🐞 BugAgent Prompt
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools