fix(bigquery): treat table-not-found at sync time as non-retryable#65039
Open
Gilbert09 wants to merge 1 commit into
Open
fix(bigquery): treat table-not-found at sync time as non-retryable#65039Gilbert09 wants to merge 1 commit into
Gilbert09 wants to merge 1 commit into
Conversation
A table deleted or renamed in BigQuery after schema discovery surfaces from get_table() in _build_source_response as a google NotFound whose message is "Not found: Table <project>:<dataset>.<table>". The existing "was not found in location" key only covers dataset-region 404s, so this slips through and retries forever. Match the stable "Not found: Table" wording and stop retrying. Generated-By: PostHog Code Task-Id: 8f4f3b01-0ee7-4a00-a421-8a70bac394a4
Contributor
|
Hey @Gilbert09! 👋 It looks like your git author email on this PR isn't your
You can fix it for this repo with: git config user.email "you@posthog.com"Or set it globally with |
Contributor
|
Reviews (1): Last reviewed commit: "fix(bigquery): treat table-not-found at ..." | Re-trigger Greptile |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
BigQuery error tracking surfaced a
NotFoundthat retries forever and spams error tracking:Error tracking issue: https://us.posthog.com/project/2/error_tracking/019ee69e-92e4-7653-a0a1-4b2540e3ff53
The stack trace originates in this source's code:
A table that schema discovery selected was deleted or renamed in BigQuery before the sync ran (common with dbt-managed datasets, which drop/recreate tables).
get_table()then 404s. This is an upstream/data condition: retrying within the sync's window can't make a missing table reappear, so the run just hammers BigQuery and refires the exception every attempt.The source already treats the dataset-region 404 (
"... was not found in location US") as non-retryable, but that key doesn't match the table-level wording ("Not found: Table <id>"), so this case fell through.Changes
"Not found: Table"entry toBigQuerySource.get_non_retryable_errors()with an actionable message telling the user the synced table was deleted/renamed and how to recover. Matched on the stable substring, not the volatile project/dataset/table id.This is scoped strictly to this error — no change to global retry policy or any other code path. It mirrors the existing
"was not found in location"entry's reasoning for the table-level 404.How did you test this code?
I'm an agent. Added two regression tests at the same layer as the fix:
test_bigquery_table_not_found_during_sync_is_non_retryable— asserts a representativeget_table()404 message is recognised as non-retryable (and confirms it doesn't contain"was not found in location", i.e. the existing key wouldn't have caught it).test_bigquery_table_not_found_key_does_not_match_unrelated_errors— guards against the new key swallowing transient 5xx errors that must stay retryable.Ran:
ruff check+ruff format --checkpass on the touched files.🤖 Agent context
Autonomy: Fully autonomous
Part of a triage pass over data-warehouse import errors. Classified as an upstream/data condition (a synced table no longer exists), not a fixable bug in our code — there's no sensible graceful-skip at this layer since the activity exists to sync that one table. Verified against open PRs: #64996 handles the schema-discovery path (
get_columns→INFORMATION_SCHEMA,"was not found in location"wording) and explicitly leaves the sync-path 404 untouched; #64841 is abouttoken_uri. So this error path was unhandled and no open PR addresses it.