Skip to content

fix(bigquery): treat table-not-found at sync time as non-retryable#65039

Open
Gilbert09 wants to merge 1 commit into
masterfrom
posthog-code/bigquery-table-not-found-non-retryable
Open

fix(bigquery): treat table-not-found at sync time as non-retryable#65039
Gilbert09 wants to merge 1 commit into
masterfrom
posthog-code/bigquery-table-not-found-non-retryable

Conversation

@Gilbert09

Copy link
Copy Markdown
Member

Problem

BigQuery error tracking surfaced a NotFound that retries forever and spams error tracking:

GET https://bigquery.googleapis.com/.../tables/<table>?prettyPrint=false: Not found: Table <project>:<dataset>.<table>

Error tracking issue: https://us.posthog.com/project/2/error_tracking/019ee69e-92e4-7653-a0a1-4b2540e3ff53

The stack trace originates in this source's code:

build_pipeline (bigquery.py:871)
  → _build_source_response (bigquery.py:938)   # bq_client.get_table(fully_qualified_table_name)
    → google.cloud.bigquery.client.get_table → NotFound

A table that schema discovery selected was deleted or renamed in BigQuery before the sync ran (common with dbt-managed datasets, which drop/recreate tables). get_table() then 404s. This is an upstream/data condition: retrying within the sync's window can't make a missing table reappear, so the run just hammers BigQuery and refires the exception every attempt.

The source already treats the dataset-region 404 ("... was not found in location US") as non-retryable, but that key doesn't match the table-level wording ("Not found: Table <id>"), so this case fell through.

Changes

  • Add a "Not found: Table" entry to BigQuerySource.get_non_retryable_errors() with an actionable message telling the user the synced table was deleted/renamed and how to recover. Matched on the stable substring, not the volatile project/dataset/table id.

This is scoped strictly to this error — no change to global retry policy or any other code path. It mirrors the existing "was not found in location" entry's reasoning for the table-level 404.

How did you test this code?

I'm an agent. Added two regression tests at the same layer as the fix:

  • test_bigquery_table_not_found_during_sync_is_non_retryable — asserts a representative get_table() 404 message is recognised as non-retryable (and confirms it doesn't contain "was not found in location", i.e. the existing key wouldn't have caught it).
  • test_bigquery_table_not_found_key_does_not_match_unrelated_errors — guards against the new key swallowing transient 5xx errors that must stay retryable.

Ran:

uv run python -m pytest posthog/temporal/data_imports/sources/bigquery/tests/test_source.py
# 78 passed

ruff check + ruff format --check pass on the touched files.

🤖 Agent context

Autonomy: Fully autonomous

Part of a triage pass over data-warehouse import errors. Classified as an upstream/data condition (a synced table no longer exists), not a fixable bug in our code — there's no sensible graceful-skip at this layer since the activity exists to sync that one table. Verified against open PRs: #64996 handles the schema-discovery path (get_columnsINFORMATION_SCHEMA, "was not found in location" wording) and explicitly leaves the sync-path 404 untouched; #64841 is about token_uri. So this error path was unhandled and no open PR addresses it.

A table deleted or renamed in BigQuery after schema discovery surfaces from
get_table() in _build_source_response as a google NotFound whose message is
"Not found: Table <project>:<dataset>.<table>". The existing "was not found in
location" key only covers dataset-region 404s, so this slips through and retries
forever. Match the stable "Not found: Table" wording and stop retrying.

Generated-By: PostHog Code
Task-Id: 8f4f3b01-0ee7-4a00-a421-8a70bac394a4
@Gilbert09 Gilbert09 added the stamphog Request AI review from stamphog label Jun 20, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Hey @Gilbert09! 👋

It looks like your git author email on this PR isn't your @posthog.com address (owerstom@gmail.com). Since you're on the PostHog team, it's worth pointing your local git author email at your @posthog.com address. Why it matters:

  • Consistent work identity in git history — internal tooling that attributes commits to team members keys off your @posthog.com address.
  • Keeps team contributions easy to tell apart from external community ones when scanning history.

You can fix it for this repo with:

git config user.email "you@posthog.com"

Or set it globally with git config --global user.email "you@posthog.com". No need to redo this PR — just a nudge for next time. 🙂

@assign-reviewers-posthog assign-reviewers-posthog Bot requested a review from a team June 20, 2026 20:05

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additive, well-tested fix — adds a new non-retryable error pattern to prevent infinite retries when a BigQuery table is deleted after schema discovery. No data model, API, or dependency changes; author is on the owning team.

@greptile-apps

greptile-apps Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Reviews (1): Last reviewed commit: "fix(bigquery): treat table-not-found at ..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stamphog Request AI review from stamphog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant