Skip to content

Do not duplicate BEGIN statement #944

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

rjspotter
Copy link

Because:

This commit:

  • Bypasses the dbt generated BEGIN statement in dbt-postgres

resolves #
dbt-labs/dbt-core#9434

Problem

Two begin statements are issued one by the default behaviour of psycopg2, the other by dbt via begin() calling add_begin_query()

[BEFORE] by modifying sql/connections.py you can see the BEGIN from psycopg2 and the modified BEGIN from dbt

DEBU[21092] Received message: {"Type":"Query","String":"BEGIN"}
TRAC[21092] Handling message: *pgproto3.Query
TRAC[21092] running statement {BEGIN BEGIN TRANSACTION BEGIN true false false <nil> <nil> <nil>}  connectionID=43 protocol=pg
TRAC[21092] doQuery                                       connectTime="2025-03-25 21:29:14.695163651 +0000 UTC m=+21075.616878838" connectionDb=public connectionID=43 protocol=postgres query=BEGIN
DEBU[21092] Starting query                                connectTime="2025-03-25 21:29:14.695163651 +0000 UTC m=+21075.616878838" connectionDb=public connectionID=43 query=BEGIN
TRAC[21092] beginning execution                           connectTime="2025-03-25 21:29:14.695163651 +0000 UTC m=+21075.616878838" connectionDb=public connectionID=43 query=BEGIN
DEBU[21092] Query finished in 1 ms                        connectTime="2025-03-25 21:29:14.695163651 +0000 UTC m=+21075.616878838" connectionDb=public connectionID=43 query=BEGIN
TRAC[21092] AtLeastOneBatch=false RowsInLastBatch=0       connectTime="2025-03-25 21:29:14.695163651 +0000 UTC m=+21075.616878838" connectionDb=public connectionID=43 query=BEGIN
TRAC[21092] spooling 0 rows for tag BEGIN (execute = false)
DEBU[21092] Received message: {"Type":"Query","String":"BEGIN -- from dbt"}
TRAC[21092] Handling message: *pgproto3.Query
TRAC[21092] running statement {BEGIN -- from dbt BEGIN TRANSACTION BEGIN true false false <nil> <nil> <nil>}  connectionID=43 protocol=pg
TRAC[21092] doQuery                                       connectTime="2025-03-25 21:29:14.695163651 +0000 UTC m=+21075.616878838" connectionDb=public connectionID=43 protocol=postgres query="BEGIN -- from dbt"
DEBU[21092] Starting query                                connectTime="2025-03-25 21:29:14.695163651 +0000 UTC m=+21075.616878838" connectionDb=public connectionID=43 query="BEGIN -- from dbt"
TRAC[21092] beginning execution                           connectTime="2025-03-25 21:29:14.695163651 +0000 UTC m=+21075.616878838" connectionDb=public connectionID=43 query="BEGIN -- from dbt"
WARN[21092] error running query                           connectTime="2025-03-25 21:29:14.695163651 +0000 UTC m=+21075.616878838" connectionDb=public connectionID=43 error="TransactionContext Error: cannot start a transaction within a transaction" query="BEGIN -- from dbt"
fallback statement execution failed: TransactionContext Error: cannot start a transaction within a transaction (errno 1105) (sqlstate HY000)

Solution

Make add_begin_query a passthrough for dbt-postgres

[AFTER] you get exactly one BEGIN and one COMMIT per query/session

INFO[29821] NewConnection                                 DisableClientMultiStatements=true connectionID=158
TRAC[29821] doQuery                                       connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionID=158 protocol=postgres query="USE pipeline.public;"
DEBU[29821] Starting query                                connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionID=158 query="USE pipeline.public;"
TRAC[29821] beginning execution                           connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionID=158 query="USE pipeline.public;"
DEBU[29821] Query finished in 1 ms                        connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionDb=public connectionID=158 query="USE pipeline.public;"
TRAC[29821] AtLeastOneBatch=false RowsInLastBatch=0       connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionDb=public connectionID=158 query="USE pipeline.public;"
DEBU[29821] Received message: {"Type":"Query","String":"BEGIN"}
TRAC[29821] Handling message: *pgproto3.Query
TRAC[29821] running statement {BEGIN BEGIN TRANSACTION BEGIN true false false <nil> <nil> <nil>}  connectionID=158 protocol=pg
TRAC[29821] doQuery                                       connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionDb=public connectionID=158 protocol=postgres query=BEGIN
DEBU[29821] Starting query                                connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionDb=public connectionID=158 query=BEGIN
TRAC[29821] beginning execution                           connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionDb=public connectionID=158 query=BEGIN
DEBU[29821] Query finished in 2 ms                        connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionDb=public connectionID=158 query=BEGIN
TRAC[29821] AtLeastOneBatch=false RowsInLastBatch=0       connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionDb=public connectionID=158 query=BEGIN
TRAC[29821] spooling 0 rows for tag BEGIN (execute = false)
DEBU[29821] Received message: {"Type":"Query","String":"COMMIT"}
TRAC[29821] Handling message: *pgproto3.Query
TRAC[29821] running statement {COMMIT COMMIT TRANSACTION COMMIT true false false <nil> <nil> <nil>}  connectionID=158 protocol=pg
TRAC[29821] doQuery                                       connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionDb=public connectionID=158 protocol=postgres query=COMMIT
DEBU[29821] Starting query                                connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionDb=public connectionID=158 query=COMMIT
TRAC[29821] beginning execution                           connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionDb=public connectionID=158 query=COMMIT
DEBU[29821] Query finished in 2 ms                        connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionDb=public connectionID=158 query=COMMIT
TRAC[29821] AtLeastOneBatch=false RowsInLastBatch=0       connectTime="2025-03-25 23:55:00.361635596 +0000 UTC m=+29821.283350774" connectionDb=public connectionID=158 query=COMMIT
TRAC[29821] spooling 0 rows for tag COMMIT (execute = false)
DEBU[29821] Received message: {"Type":"Terminate"}
TRAC[29821] Handling message: *pgproto3.Terminate

Notice that add_commit_query retains the existing behaviour.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@rjspotter rjspotter requested a review from a team as a code owner March 26, 2025 00:38
@cla-bot cla-bot bot added the cla:yes The PR author has signed the CLA label Mar 26, 2025
@cla-bot cla-bot bot temporarily deployed to dbt-postgres March 26, 2025 00:38 Inactive
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change in the dbt-postgres package. For details on how to document a change, see the Contributing Guide.

Because:
- dbt-labs/dbt-core#9434
- Psychopg2 starts a transaction automatically but does not
  necessarily end it without intervention
  "By default, Psycopg opens a transaction before executing the first command:"
  https://www.psycopg.org/docs/connection.html
- Postgres allows the behaviour of two begin statements with only a
  warning. Other systems that use the Postgres protocols and syntax
  are not necessarily so forgiving.

This commit:
- Bypasses the dbt generated BEGIN statement in dbt-postgres
@rjspotter
Copy link
Author

How long does it normally take for a reviewer to be assigned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes The PR author has signed the CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant