fix: allow EXPLAIN on multi-statement SQL beginning with SET #20106

jasonmp85 · 2025-04-18T16:03:50Z

What does this PR do?

Problem

Some users prepend one or more SET statements when issuing SQL to PostgreSQL, usually for auditing, tracking, or tracing of some kind. These are issued by the client as a single string, triggering the multiple statements in a single query behavior.

However, this entire string will appear in the pg_stat_activity view, which is where the agent grabs queries for periodic examination with EXPLAIN.

EXPLAIN's syntax does not support multiple statements, and as such users of the above pattern miss out on EXPLAIN plan capture.

Solution

Because SET syntax is fairly limited, we can detect leading SET statements with a regular expression. Certain other processes track queries with a leading "C-style" comment, so we trim the first of those as well.

For performance, we check the first word of the obfuscated SQL. If it is not SET, the regex is skipped entirely. If it is, we use the regex to trim a leading comment and one or more SET statements from the SQL.

Whatever remains is passed to EXPLAIN.

Out of Scope

The remaining SQL may still have more than one statement. This is not supported now, and the patch does not change that
Some SET statements may modify the behavior of PostgreSQL in a way that would be interesting to EXPLAIN (scan cost, for instance). Running them before the EXPLAIN is left as a later enhancement
Though the regex supports much of SET, dollar-delimited string literals are not supported. Pushing the logic to the lexer might help these problems in a more robust way

Motivation

DBMON-2626

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

codecov · 2025-04-18T16:55:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.41%. Comparing base (85fca9e) to head (b7985e3).
Report is 7 commits behind head on master.

Additional details and impacted files

Flag	Coverage Δ
activemq	`?`
cassandra	`?`
hive	`?`
hivemq	`?`
hudi	`?`
ignite	`?`
jboss_wildfly	`?`
kafka	`?`
postgres	`93.29% <100.00%> (+18.26%)`	⬆️
presto	`?`
solr	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

postgres/datadog_checks/postgres/statement_samples.py

postgres/tests/test_statements.py

lu-zhengda

Trim leading SET statement would make it pass _can_explain_statement. Can you add an integration test to make sure explain will work with multi statements?
If it's not feasible to do it with integration test and you tested it manually, can you add a screenshot instead?

postgres/datadog_checks/postgres/statement_samples.py

jasonmp85 · 2025-04-23T14:42:25Z

Trim leading SET statement would make it pass _can_explain_statement. Can you add an integration test to make sure explain will work with multi statements?

This doesn't modify the behavior of multi-statements that don't have SET prefixes, for instance, an INSERT followed by a SELECT. This behavior should be unmodified by my patch, no?

lu-zhengda · 2025-04-23T15:02:23Z

Trim leading SET statement would make it pass _can_explain_statement. Can you add an integration test to make sure explain will work with multi statements?

This doesn't modify the behavior of multi-statements that don't have SET prefixes, for instance, an INSERT followed by a SELECT. This behavior should be unmodified by my patch, no?

what i mean is it is unknown to me if the integration is able to explain a multi-statement or not. Before the PR it fails quickly before the integration tries to explain it. With the change, statement like below will go through explaining.

SET LOCAL blah = 'x';
SELECT * from sometable;

Do we know if the explain will success or not?

jasonmp85 · 2025-04-23T17:39:35Z

Trim leading SET statement would make it pass _can_explain_statement. Can you add an integration test to make sure explain will work with multi statements?

Adding the integration test.

SQL strings containing multiple statements cause problems with the ex- isting explain logic. This change adds a regex to strip an arbitrary number of `SET` commands from the head of all incoming SQL, such that the eventual `EXPLAIN` will be performed on just the trailing SQL. This doesn't fully address all multi-statement SQL strings: in part- icular clients might send e.g. both a SELECT and an UPDATE in the same string. As before, this would be passed as-is to the EXPLAIN machinery. Refs: DBMON-2626

jasonmp85 · 2025-04-24T16:02:16Z

Ok, @lu-zhengda, added the failfast statement and an integration test to show the EXPLAIN plan working.

I realized my assumption that the obfuscated_statement was used for execution was incorrect (it was used in a test and passed around to various functions and I missed that it wasn't what was being run).

Because of the whitespace/comment issue you mentioned, I can't get around trimming the obfuscated SQL (for the "is this supported?" check), but then also need to trim the original statement.

Because parsing comments is going to be a lot more tedious than SET statements, this means this change won't support SET-prefixed SQL which contains comments. I think that still makes this a strict upgrade over the existing behavior, though.

lu-zhengda · 2025-04-24T16:13:42Z

@jasonmp85 regarding

Because of the whitespace/comment issue you mentioned, I can't get around trimming the obfuscated SQL (for the "is this supported?" check), but then also need to trim the original statement.

Does this mean the integration is not able to explain SET LOCAL datestyle TO postgres; SELECT * FROM pg_class? It has to trim ``SET LOCAL datestyle TO postgres;` from the original statement before running explain.

jasonmp85 · 2025-04-24T16:19:41Z

Does this mean the integration is not able to explain SET LOCAL datestyle TO postgres; SELECT * FROM pg_class?

No, it can do that (see that's exactly the test I added).

I just mean if the original statement were something like /* comment */ LOCAL datestyle TO postgres; SELECT * FROM pg_class it would not work. Comments won't be supported in this case.

jasonmp85 · 2025-04-24T19:59:54Z

I've updated the description of this PR to capture discussions and scope creep/scope creep pushback. Read it to see what this PR covers and what it won't.

lucaloncar-dd · 2025-04-24T20:56:26Z

postgres/datadog_checks/postgres/statement_samples.py

+        orig_statement = statement
+
+        # remove leading SET statements from our SQL
+        if obfuscated_statement[:3].lower() == "set":


nit-pick because I'm waiting for this rapid deploy:

feels like having trim_leading_set_stmts be a no-op if it doesn't start with set and then removing the conditional could be nice

also python strings have a startswith() method

(feel free to ignore this, this is just the most context switching I can do between builds)

This conditional was at the behest of @lu-zhengda to short-circuit for performance reasons (apparently agent environments vary a lot and performance is a concern). I suppose the cost of a function invocation isn't too bad but given the purpose of the conditional, ehhhhh idk.

Thanks for the startswith pointer, my scripting languages of choice have always been Ruby and Perl :-P. Won't help with the case issue here, unless I incur the cost of lower on the entire string…

Won't help with the case issue here, unless I incur the cost of lower on the entire string…

💯

lu-zhengda · 2025-04-24T21:46:09Z

LGTM

…20106)" This reverts commit 9b43811.

…20106)" (#20282) This reverts commit 9b43811.

…#20106)" (#20282) This reverts commit 6486c7f.

* Reapply "fix: allow EXPLAIN on multi-statement SQL beginning with SET (#20106)" (#20282) This reverts commit 6486c7f. * Address catastrophic backtracking in EXPLAIN regex Took a bit to pin down why this was happening, but with a fuzzer and some articles about exponential regex performance, I could address the underlying issue without changing the code much. * Fix changelog * Formatting fixes * Pass query_signature down into explain_statement No need to recalculate this, and passing it down will allow trimmed SQL to reuse the original query's signature.

jasonmp85 requested review from a team as code owners April 18, 2025 16:03

temporal-github-worker-1 bot added agent/review-requested ecosystems/review-requested product/review-requested labels Apr 18, 2025

datadog-agent-integrations-bot bot added integration/postgres team/agent-integrations team/database-monitoring-agent labels Apr 18, 2025

jasonmp85 force-pushed the fix-explain-with-set branch 2 times, most recently from 431c4d2 to 4dabdf0 Compare April 21, 2025 15:30

eric-weaver reviewed Apr 21, 2025

View reviewed changes

postgres/datadog_checks/postgres/statement_samples.py Outdated Show resolved Hide resolved

dujuku requested changes Apr 21, 2025

View reviewed changes

jasonmp85 force-pushed the fix-explain-with-set branch from 4dabdf0 to 77d7577 Compare April 22, 2025 19:33

jasonmp85 requested a review from dujuku April 22, 2025 20:08

jasonmp85 force-pushed the fix-explain-with-set branch from 77d7577 to f37b43b Compare April 22, 2025 20:27

lu-zhengda reviewed Apr 22, 2025

View reviewed changes

postgres/datadog_checks/postgres/statement_samples.py Outdated Show resolved Hide resolved

jasonmp85 force-pushed the fix-explain-with-set branch from 0e89836 to 269f207 Compare April 23, 2025 20:36

jasonmp85 added 7 commits April 24, 2025 09:58

Add changelog entry for new EXPLAIN behavior

d6e7177

Simplified regex

d5aa241

Address code review feedback

d60c333

This is more of a fix than feature

c591b3e

Add integration tests, fix logic

02d7993

Refactor

a14adb8

jasonmp85 force-pushed the fix-explain-with-set branch from 022ce6a to a14adb8 Compare April 24, 2025 15:58

Formatting fix

0a2dccd

Handle leading comment

b7985e3

jasonmp85 changed the title ~~feat: allow EXPLAIN on multi-statement SQL beginning with SET~~ fix: allow EXPLAIN on multi-statement SQL beginning with SET Apr 24, 2025

lucaloncar-dd reviewed Apr 24, 2025

View reviewed changes

lu-zhengda approved these changes Apr 24, 2025

View reviewed changes

dujuku approved these changes Apr 25, 2025

View reviewed changes

jasonmp85 added this pull request to the merge queue Apr 25, 2025

Merged via the queue into master with commit 9b43811 Apr 25, 2025
35 checks passed

jasonmp85 deleted the fix-explain-with-set branch April 25, 2025 17:20

jasonmp85 added a commit that referenced this pull request May 13, 2025

Revert "fix: allow EXPLAIN on multi-statement SQL beginning with SET (#…

5cbda3e

…20106)" This reverts commit 9b43811.

jasonmp85 mentioned this pull request May 13, 2025

Revert "fix: allow EXPLAIN on multi-statement SQL beginning with SET … #20282

Merged

3 tasks

github-merge-queue bot pushed a commit that referenced this pull request May 13, 2025

Revert "fix: allow EXPLAIN on multi-statement SQL beginning with SET (#…

6486c7f

…20106)" (#20282) This reverts commit 9b43811.

jasonmp85 added a commit that referenced this pull request May 16, 2025

Reapply "fix: allow EXPLAIN on multi-statement SQL beginning with SET (…

7bdbf1b

…#20106)" (#20282) This reverts commit 6486c7f.

jasonmp85 mentioned this pull request May 16, 2025

Fix set explain regex #20319

Merged

3 tasks

jasonmp85 added a commit that referenced this pull request May 19, 2025

Reapply "fix: allow EXPLAIN on multi-statement SQL beginning with SET (…

075e1c8

…#20106)" (#20282) This reverts commit 6486c7f.

jasonmp85 added a commit that referenced this pull request May 21, 2025

Reapply "fix: allow EXPLAIN on multi-statement SQL beginning with SET (…

9c7d8a0

…#20106)" (#20282) This reverts commit 6486c7f.

jasonmp85 added a commit that referenced this pull request May 27, 2025

Reapply "fix: allow EXPLAIN on multi-statement SQL beginning with SET (…

093266f

…#20106)" (#20282) This reverts commit 6486c7f.

fix: allow EXPLAIN on multi-statement SQL beginning with SET #20106

fix: allow EXPLAIN on multi-statement SQL beginning with SET #20106

Uh oh!

Conversation

jasonmp85 commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Problem

Solution

Out of Scope

Motivation

Review checklist (to be filled by reviewers)

Uh oh!

codecov bot commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lu-zhengda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jasonmp85 commented Apr 23, 2025

Uh oh!

lu-zhengda commented Apr 23, 2025

Uh oh!

jasonmp85 commented Apr 23, 2025

Uh oh!

jasonmp85 commented Apr 24, 2025

Uh oh!

lu-zhengda commented Apr 24, 2025

Uh oh!

jasonmp85 commented Apr 24, 2025

Uh oh!

jasonmp85 commented Apr 24, 2025

Uh oh!

lucaloncar-dd Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

lucaloncar-dd Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

jasonmp85 Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

lu-zhengda Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lu-zhengda commented Apr 24, 2025

Uh oh!

Uh oh!

Uh oh!

jasonmp85 commented Apr 18, 2025 •

edited

Loading

codecov bot commented Apr 18, 2025 •

edited

Loading

lu-zhengda Apr 24, 2025 •

edited

Loading