Skip to content

fix(signals): evict stale db connections in temporal activities#58709

Draft
posthog[bot] wants to merge 1 commit into
masterfrom
posthog-code/signals-stale-db-connection-fix
Draft

fix(signals): evict stale db connections in temporal activities#58709
posthog[bot] wants to merge 1 commit into
masterfrom
posthog-code/signals-stale-db-connection-fix

Conversation

@posthog
Copy link
Copy Markdown
Contributor

@posthog posthog Bot commented May 17, 2026

Problem

A new OperationalError: the connection is closed exception surfaced in the signals semantic-search Temporal activity. The traceback lands inside run_signal_semantic_search_activity at products/signals/backend/temporal/signal_queries.py, where await Team.objects.aget(pk=input.team_id) calls into psycopg's _check_connection_ok and finds the cached Django connection closed before it can create a cursor.

Long-running Temporal workers don't go through Django's request cycle, so the request_started / request_finished signals that normally invoke close_old_connections() never fire. Connections that exceed CONN_MAX_AGE or are killed by Postgres/middleboxes stay in the per-thread pool until the next ORM call fails. The same unguarded pattern exists across the rest of the signals Temporal activities, even though posthog/temporal/common/utils.py already exposes the canonical close_db_connections decorator and other products (tasks) already apply it.

Changes

Stack @close_db_connections from posthog.temporal.common.utils beneath @temporalio.activity.defn and @scoped_temporal() on every signals Temporal activity that touches the Django ORM:

  • signal_queries.pyfetch_signal_type_examples_activity, run_signal_semantic_search_activity, wait_for_signal_in_clickhouse_activity, fetch_signals_for_report_activity
  • summary.pymark_report_in_progress_activity, mark_report_ready_activity, mark_report_failed_activity, mark_report_pending_input_activity, reset_report_to_potential_activity
  • reingestion.pysoft_delete_report_signals_activity, delete_report_activity, reingest_signals_activity, process_team_signals_batch_activity, delete_team_reports_activity
  • grouping.pyget_embedding_activity, fetch_report_contexts_activity, assign_and_emit_signal_activity
  • backfill_error_tracking.pyfetch_error_tracking_issues_activity, emit_backfill_signal_activity
  • emit_eval_signal.pyemit_eval_signal_activity
  • report_safety_judge.pyreport_safety_judge_activity
  • agentic/select_repository.pyselect_repository_activity
  • agentic/report.pyrun_agentic_report_activity

Activities that only talk to ClickHouse, Kafka, object storage, or the Temporal client (publish_report_completed_activity, pause_grouping_until_activity, get_grouping_paused_state_activity, restore_grouping_pause_activity, read_signals_from_s3_activity) are left untouched.

The decorator no-ops under settings.TEST, so existing pytest fixtures that rely on transaction=True are unaffected.

How did you test this code?

Agent-authored change.

  • Verified all 9 modified Python files parse cleanly with ast.parse.
  • ruff check products/signals/backend/temporal/ — all checks passed.
  • ruff format --check products/signals/backend/temporal/ — all files already formatted.
  • No automated tests added; this matches how the equivalent decorator is applied in products/tasks/backend/temporal/process_task/activities/*.py, where the contract is documented in posthog/temporal/common/utils.py and exercised by posthog/temporal/tests/common/test_utils.py.

Publish to changelog?

no

🤖 Agent context

PostHog Code agent worked from a signal report flagging a single new OperationalError: the connection is closed issue in run_signal_semantic_search_activity. The report identified the canonical close_db_connections decorator as the existing fix pattern (already used in the tasks product) and recommended a sweep across the rest of the signals activities. The agent verified the decorator's stacking expectations (@activity.defn on top, @close_db_connections innermost), then applied it across every signals activity that calls the Django ORM, skipping ones that only touch ClickHouse, Kafka, object storage, or the Temporal client to avoid noise.


Created with PostHog Code

Long-running Temporal workers don't go through Django's request cycle, so
the request_started / request_finished signals that normally call
close_old_connections() never fire. Connections that exceed CONN_MAX_AGE
or get killed by the database stay in the per-thread pool until the next
ORM call fails — which is what surfaced as an
OperationalError: the connection is closed inside
run_signal_semantic_search_activity.

Apply the existing @close_db_connections decorator from
posthog.temporal.common.utils across every signals Temporal activity that
touches the Django ORM, so connections are evicted before and after each
invocation. Skipped activities that only talk to ClickHouse, Kafka,
object storage, or the Temporal client.

Generated-By: PostHog Code
Task-Id: a526a5ec-c960-4d99-b513-3b0f0cbe87a7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants