Skip to content

fix: resolve race condition in compound trigger evaluation#148

Open
tomerqodo wants to merge 2 commits into
qodo_combined_20260121_qodo_grep_cursor_copilot_1_base_fix_resolve_race_condition_in_compound_trigger_evaluation_pr563from
qodo_combined_20260121_qodo_grep_cursor_copilot_1_head_fix_resolve_race_condition_in_compound_trigger_evaluation_pr563
Open

fix: resolve race condition in compound trigger evaluation#148
tomerqodo wants to merge 2 commits into
qodo_combined_20260121_qodo_grep_cursor_copilot_1_base_fix_resolve_race_condition_in_compound_trigger_evaluation_pr563from
qodo_combined_20260121_qodo_grep_cursor_copilot_1_head_fix_resolve_race_condition_in_compound_trigger_evaluation_pr563

Conversation

@tomerqodo
Copy link
Copy Markdown

Benchmark PR from qodo-benchmark#563

desertaxle and others added 2 commits January 21, 2026 15:46
Fixes two race conditions in compound trigger evaluation:

1. **Never-firing race** (transactional): When two child triggers fire
   concurrently in separate transactions, each only sees its own
   uncommitted insert due to READ COMMITTED isolation. Neither sees
   enough firings to trigger the parent.

   Fix: Use PostgreSQL advisory locks to serialize concurrent evaluations
   for the same compound trigger.

2. **Double-firing race** (autocommit): When both transactions see all
   firings, both delete and both fire the parent.

   Fix: Use DELETE ... RETURNING to make clearing a claim operation.
   Only the worker that successfully deletes the expected firings
   proceeds; others bail out.

Based on the fix in PrefectHQ/nebula#10716.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (5) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Nonstandard logger initialization 📘 Rule violation ✧ Quality
Description
src/prefect/server/events/triggers.py initializes logger via logging.getLogger(__name__)
  without the standardized type-annotated get_logger("module_name") pattern.
• This breaks consistency and type safety expectations for logging setup across the codebase.
Code

src/prefect/server/events/triggers.py[R71-73]

+import logging

-logger: "logging.Logger" = get_logger(__name__)
+logger = logging.getLogger(__name__)
Evidence
Compliance ID 9 requires the standardized logger initialization pattern. The updated code instead
uses logging.getLogger(__name__) and omits the required type annotation and
get_logger("module_name") call.

AGENTS.md
src/prefect/server/events/triggers.py[71-74]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`src/prefect/server/events/triggers.py` initializes `logger` with `logging.getLogger(__name__)`, which violates the required standardized initialization pattern.

## Issue Context
Compliance requires all logger instances to follow `logger: &quot;logging.Logger&quot; = get_logger(&quot;module_name&quot;)` for consistent configuration and type safety.

## Fix Focus Areas
- src/prefect/server/events/triggers.py[27-74]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Missing future annotations import 📘 Rule violation ✓ Correctness
Description
src/prefect/server/events/models/composite_trigger_child_firing.py contains type annotations but
  does not include from __future__ import annotations as the first import.
• This violates the required convention for forward references and consistent type-checking
  performance.
Code

src/prefect/server/events/models/composite_trigger_child_firing.py[R17-20]

+async def acquire_composite_trigger_lock(
+    session: AsyncSession,
+    trigger: CompositeTrigger,
+) -> None:
Evidence
Compliance ID 7 requires from __future__ import annotations as the first import in any src/
Python file that uses type annotations. The file includes annotated function signatures but begins
with other imports instead.

AGENTS.md
src/prefect/server/events/models/composite_trigger_child_firing.py[1-7]
src/prefect/server/events/models/composite_trigger_child_firing.py[17-21]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The file uses type annotations but is missing `from __future__ import annotations` as the first import.

## Issue Context
This import is required for all annotated Python files under `src/`.

## Fix Focus Areas
- src/prefect/server/events/models/composite_trigger_child_firing.py[1-21]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. triggers.py missing future annotations 📘 Rule violation ✓ Correctness
Description
src/prefect/server/events/triggers.py contains type annotations but does not include `from
  __future__ import annotations` as the first import.
• This violates the required convention for annotated src/ Python files.
Code

src/prefect/server/events/triggers.py[R71-73]

+import logging

-logger: "logging.Logger" = get_logger(__name__)
+logger = logging.getLogger(__name__)
Evidence
Compliance ID 7 requires from __future__ import annotations as the first import in annotated
src/ Python files. This file has annotated function signatures but starts with a module docstring
and then other imports, with no future-annotations import.

AGENTS.md
src/prefect/server/events/triggers.py[1-18]
src/prefect/server/events/triggers.py[82-88]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The file contains type annotations but lacks `from __future__ import annotations` as required.

## Issue Context
All annotated Python files in `src/` must include this import as the first import statement.

## Fix Focus Areas
- src/prefect/server/events/triggers.py[1-30]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (3)
4. clear_child_firings returns wrong IDs 📘 Rule violation ✓ Correctness
Description
clear_child_firings documents that it returns deleted child_firing_id values, but the SQL
  RETURNING clause returns child_trigger_id instead.
• This makes the function misleading and can cause downstream logic/logging to reason about the
  wrong identifiers.
Code

src/prefect/server/events/models/composite_trigger_child_firing.py[R139-155]

+) -> set[UUID]:
+    """
+    Delete the specified child firings and return the IDs that were actually deleted.
+
+    Returns the set of child_firing_ids that were successfully deleted. Callers can
+    compare this to the expected firing_ids to detect races and avoid double-firing
+    composite triggers.
+    """
+    result = await session.execute(
+        sa.delete(db.CompositeTriggerChildFiring)
+        .filter(
            db.CompositeTriggerChildFiring.automation_id == trigger.automation.id,
            db.CompositeTriggerChildFiring.parent_trigger_id == trigger.id,
            db.CompositeTriggerChildFiring.child_firing_id.in_(firing_ids),
        )
+        .returning(db.CompositeTriggerChildFiring.child_trigger_id)
    )
Evidence
Compliance ID 2 requires self-documenting code and forbids misleading identifiers/behavior. The
docstring promises deleted child_firing_ids, but the implementation returns child_trigger_id,
creating a mismatch between documented intent and actual behavior.

Rule 2: Generic: Meaningful Naming and Self-Documenting Code
src/prefect/server/events/models/composite_trigger_child_firing.py[139-155]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`clear_child_firings` claims to return deleted `child_firing_id` values, but the query returns `child_trigger_id`, making the function misleading.

## Issue Context
Callers compare the returned IDs to the expected `firing_ids` to detect races; returning a different identifier type undermines that intent.

## Fix Focus Areas
- src/prefect/server/events/models/composite_trigger_child_firing.py[133-157]
- src/prefect/server/events/triggers.py[390-410]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Unstable lock key 🐞 Bug ✓ Correctness
Description
• The advisory lock key is computed with Python’s hash(str(trigger.id)), which can differ across
  worker processes.
• Since pg_advisory_xact_lock requires all contenders to use the same key, different keys mean no
  actual serialization and the “missed parent firing” race can still occur in multi-worker
  deployments.
• This undermines the stated purpose of the new lock in evaluate_composite_trigger.
Code

src/prefect/server/events/models/composite_trigger_child_firing.py[R38-45]

+    if dialect.name == "postgresql":
+        # Use the trigger's UUID as the lock key
+        # pg_advisory_xact_lock takes a bigint, so we use the UUID's int representation
+        # truncated to fit (collision is extremely unlikely and benign)
+        lock_key = hash(str(trigger.id)) % (2**63)
+        await session.execute(
+            sa.text("SELECT pg_advisory_xact_lock(:key)"), {"key": lock_key}
+        )
Evidence
The lock is intended to serialize concurrent evaluations of the same composite trigger, but the
implementation derives the lock key from hash(). In Python, hash() is not a stable mapping
across processes, so separate workers may not contend on the same advisory lock, negating the
serialization the caller relies on.

src/prefect/server/events/models/composite_trigger_child_firing.py[17-46]
src/prefect/server/events/triggers.py[349-353]
Best Practice: Python hashing semantics

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`acquire_composite_trigger_lock` computes the advisory lock key using Python’s `hash()`, which can vary across worker processes. This defeats the goal of serializing concurrent evaluations across multiple workers.

## Issue Context
The lock is intended to prevent a race where multiple transactions each see only their own child firing and neither fires the parent.

## Fix Focus Areas
- src/prefect/server/events/models/composite_trigger_child_firing.py[17-46]

## Suggested approach
- Replace `hash(str(trigger.id))` with a deterministic value derived from the UUID, e.g.:
 - `uuid_int = trigger.id.int`
 - Convert to a signed 64-bit integer (Postgres bigint) deterministically, e.g. mask to 63 bits or map into `[-2**63, 2**63-1]`.
- Add a small unit test or update the concurrency regression test to validate behavior across processes if feasible (or at least validate the key derivation is deterministic).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. SQLite RETURNING usage 🐞 Bug ⛯ Reliability
Description
clear_child_firings now uses DELETE ... RETURNING without checking the database dialect.
• The codebase documents that SQLite may not support RETURNING in supported configurations, and it
  already uses dialect-specific logic elsewhere to avoid RETURNING on SQLite.
• This can cause composite trigger evaluation to fail at runtime on SQLite deployments.
Code

src/prefect/server/events/models/composite_trigger_child_firing.py[R147-155]

+    result = await session.execute(
+        sa.delete(db.CompositeTriggerChildFiring)
+        .filter(
            db.CompositeTriggerChildFiring.automation_id == trigger.automation.id,
            db.CompositeTriggerChildFiring.parent_trigger_id == trigger.id,
            db.CompositeTriggerChildFiring.child_firing_id.in_(firing_ids),
        )
+        .returning(db.CompositeTriggerChildFiring.child_trigger_id)
    )
Evidence
Prefect supports SQLite (with a minimum version check) and has existing code that explicitly avoids
RETURNING on SQLite due to lack of support in some configurations. The new clear_child_firings
uses .returning(...) unconditionally, which is inconsistent with this established compatibility
handling and can break SQLite deployments.

src/prefect/server/events/models/composite_trigger_child_firing.py[133-157]
src/prefect/server/events/storage/database.py[216-225]
src/prefect/server/database/configurations.py[411-434]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`clear_child_firings` uses `DELETE .. RETURNING` unconditionally. The codebase already treats SQLite as potentially lacking `RETURNING` support, so this can break composite trigger evaluation on SQLite.

## Issue Context
Prefect supports SQLite with minimum version 3.24.0, and elsewhere explicitly avoids `RETURNING` on SQLite.

## Fix Focus Areas
- src/prefect/server/events/models/composite_trigger_child_firing.py[133-157]

## Suggested approach
- Determine dialect (similar to other code paths) and:
 - For Postgres: keep `RETURNING` if you need deleted IDs.
 - For SQLite: avoid `RETURNING` and instead rely on `result.rowcount` to detect whether all expected rows were deleted.
- Alternatively, remove `RETURNING` entirely and use `rowcount` on all dialects if you only need the count for race-detection.
- Ensure callers in `evaluate_composite_trigger` continue to correctly detect partial deletes without needing returned IDs.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

7. Lock acquisition silently skipped 📘 Rule violation ⛯ Reliability
Description
acquire_composite_trigger_lock returns early when session.get_bind() is None, with no
  logging or error signaling.
• This can silently disable the race-condition protection and make concurrency issues harder to
  diagnose.
Code

src/prefect/server/events/models/composite_trigger_child_firing.py[R30-33]

+    bind = session.get_bind()
+    if bind is None:
+        return
+
Evidence
Compliance ID 3 requires identifying and handling failure points with actionable context rather than
silent failures. Early-returning when the bind is missing is a silent failure mode that prevents
lock acquisition without any diagnostic information.

Rule 3: Generic: Robust Error Handling and Edge Case Management
src/prefect/server/events/models/composite_trigger_child_firing.py[30-33]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Lock acquisition is skipped silently when `session.get_bind()` is `None`, which can undermine the intended race-condition mitigation.

## Issue Context
If this path occurs in production, concurrent evaluations may proceed without serialization and there will be no logs explaining why.

## Fix Focus Areas
- src/prefect/server/events/models/composite_trigger_child_firing.py[17-48]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



ⓘ The new review experience is currently in Beta. Learn more

Qodo Logo

Comment on lines +71 to +73
import logging

logger: "logging.Logger" = get_logger(__name__)
logger = logging.getLogger(__name__)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Nonstandard logger initialization 📘 Rule violation ✧ Quality

src/prefect/server/events/triggers.py initializes logger via logging.getLogger(__name__)
  without the standardized type-annotated get_logger("module_name") pattern.
• This breaks consistency and type safety expectations for logging setup across the codebase.
Agent prompt
## Issue description
`src/prefect/server/events/triggers.py` initializes `logger` with `logging.getLogger(__name__)`, which violates the required standardized initialization pattern.

## Issue Context
Compliance requires all logger instances to follow `logger: "logging.Logger" = get_logger("module_name")` for consistent configuration and type safety.

## Fix Focus Areas
- src/prefect/server/events/triggers.py[27-74]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +17 to +20
async def acquire_composite_trigger_lock(
session: AsyncSession,
trigger: CompositeTrigger,
) -> None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Missing future annotations import 📘 Rule violation ✓ Correctness

src/prefect/server/events/models/composite_trigger_child_firing.py contains type annotations but
  does not include from __future__ import annotations as the first import.
• This violates the required convention for forward references and consistent type-checking
  performance.
Agent prompt
## Issue description
The file uses type annotations but is missing `from __future__ import annotations` as the first import.

## Issue Context
This import is required for all annotated Python files under `src/`.

## Fix Focus Areas
- src/prefect/server/events/models/composite_trigger_child_firing.py[1-21]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +71 to +73
import logging

logger: "logging.Logger" = get_logger(__name__)
logger = logging.getLogger(__name__)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. triggers.py missing future annotations 📘 Rule violation ✓ Correctness

src/prefect/server/events/triggers.py contains type annotations but does not include `from
  __future__ import annotations` as the first import.
• This violates the required convention for annotated src/ Python files.
Agent prompt
## Issue description
The file contains type annotations but lacks `from __future__ import annotations` as required.

## Issue Context
All annotated Python files in `src/` must include this import as the first import statement.

## Fix Focus Areas
- src/prefect/server/events/triggers.py[1-30]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +139 to 155
) -> set[UUID]:
"""
Delete the specified child firings and return the IDs that were actually deleted.

Returns the set of child_firing_ids that were successfully deleted. Callers can
compare this to the expected firing_ids to detect races and avoid double-firing
composite triggers.
"""
result = await session.execute(
sa.delete(db.CompositeTriggerChildFiring)
.filter(
db.CompositeTriggerChildFiring.automation_id == trigger.automation.id,
db.CompositeTriggerChildFiring.parent_trigger_id == trigger.id,
db.CompositeTriggerChildFiring.child_firing_id.in_(firing_ids),
)
.returning(db.CompositeTriggerChildFiring.child_trigger_id)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

4. clear_child_firings returns wrong ids 📘 Rule violation ✓ Correctness

clear_child_firings documents that it returns deleted child_firing_id values, but the SQL
  RETURNING clause returns child_trigger_id instead.
• This makes the function misleading and can cause downstream logic/logging to reason about the
  wrong identifiers.
Agent prompt
## Issue description
`clear_child_firings` claims to return deleted `child_firing_id` values, but the query returns `child_trigger_id`, making the function misleading.

## Issue Context
Callers compare the returned IDs to the expected `firing_ids` to detect races; returning a different identifier type undermines that intent.

## Fix Focus Areas
- src/prefect/server/events/models/composite_trigger_child_firing.py[133-157]
- src/prefect/server/events/triggers.py[390-410]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +30 to +33
bind = session.get_bind()
if bind is None:
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remediation recommended

5. Lock acquisition silently skipped 📘 Rule violation ⛯ Reliability

acquire_composite_trigger_lock returns early when session.get_bind() is None, with no
  logging or error signaling.
• This can silently disable the race-condition protection and make concurrency issues harder to
  diagnose.
Agent prompt
## Issue description
Lock acquisition is skipped silently when `session.get_bind()` is `None`, which can undermine the intended race-condition mitigation.

## Issue Context
If this path occurs in production, concurrent evaluations may proceed without serialization and there will be no logs explaining why.

## Fix Focus Areas
- src/prefect/server/events/models/composite_trigger_child_firing.py[17-48]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +38 to +45
if dialect.name == "postgresql":
# Use the trigger's UUID as the lock key
# pg_advisory_xact_lock takes a bigint, so we use the UUID's int representation
# truncated to fit (collision is extremely unlikely and benign)
lock_key = hash(str(trigger.id)) % (2**63)
await session.execute(
sa.text("SELECT pg_advisory_xact_lock(:key)"), {"key": lock_key}
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

6. Unstable lock key 🐞 Bug ✓ Correctness

• The advisory lock key is computed with Python’s hash(str(trigger.id)), which can differ across
  worker processes.
• Since pg_advisory_xact_lock requires all contenders to use the same key, different keys mean no
  actual serialization and the “missed parent firing” race can still occur in multi-worker
  deployments.
• This undermines the stated purpose of the new lock in evaluate_composite_trigger.
Agent prompt
## Issue description
`acquire_composite_trigger_lock` computes the advisory lock key using Python’s `hash()`, which can vary across worker processes. This defeats the goal of serializing concurrent evaluations across multiple workers.

## Issue Context
The lock is intended to prevent a race where multiple transactions each see only their own child firing and neither fires the parent.

## Fix Focus Areas
- src/prefect/server/events/models/composite_trigger_child_firing.py[17-46]

## Suggested approach
- Replace `hash(str(trigger.id))` with a deterministic value derived from the UUID, e.g.:
  - `uuid_int = trigger.id.int`
  - Convert to a signed 64-bit integer (Postgres bigint) deterministically, e.g. mask to 63 bits or map into `[-2**63, 2**63-1]`.
- Add a small unit test or update the concurrency regression test to validate behavior across processes if feasible (or at least validate the key derivation is deterministic).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +147 to 155
result = await session.execute(
sa.delete(db.CompositeTriggerChildFiring)
.filter(
db.CompositeTriggerChildFiring.automation_id == trigger.automation.id,
db.CompositeTriggerChildFiring.parent_trigger_id == trigger.id,
db.CompositeTriggerChildFiring.child_firing_id.in_(firing_ids),
)
.returning(db.CompositeTriggerChildFiring.child_trigger_id)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

7. Sqlite returning usage 🐞 Bug ⛯ Reliability

clear_child_firings now uses DELETE ... RETURNING without checking the database dialect.
• The codebase documents that SQLite may not support RETURNING in supported configurations, and it
  already uses dialect-specific logic elsewhere to avoid RETURNING on SQLite.
• This can cause composite trigger evaluation to fail at runtime on SQLite deployments.
Agent prompt
## Issue description
`clear_child_firings` uses `DELETE .. RETURNING` unconditionally. The codebase already treats SQLite as potentially lacking `RETURNING` support, so this can break composite trigger evaluation on SQLite.

## Issue Context
Prefect supports SQLite with minimum version 3.24.0, and elsewhere explicitly avoids `RETURNING` on SQLite.

## Fix Focus Areas
- src/prefect/server/events/models/composite_trigger_child_firing.py[133-157]

## Suggested approach
- Determine dialect (similar to other code paths) and:
  - For Postgres: keep `RETURNING` if you need deleted IDs.
  - For SQLite: avoid `RETURNING` and instead rely on `result.rowcount` to detect whether all expected rows were deleted.
- Alternatively, remove `RETURNING` entirely and use `rowcount` on all dialects if you only need the count for race-detection.
- Ensure callers in `evaluate_composite_trigger` continue to correctly detect partial deletes without needing returned IDs.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants