Skip to content

fix: allow EXPLAIN on multi-statement SQL beginning with SET #20106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 25, 2025

Conversation

jasonmp85
Copy link
Contributor

@jasonmp85 jasonmp85 commented Apr 18, 2025

What does this PR do?

Problem

Some users prepend one or more SET statements when issuing SQL to PostgreSQL, usually for auditing, tracking, or tracing of some kind. These are issued by the client as a single string, triggering the multiple statements in a single query behavior.

However, this entire string will appear in the pg_stat_activity view, which is where the agent grabs queries for periodic examination with EXPLAIN.

EXPLAIN's syntax does not support multiple statements, and as such users of the above pattern miss out on EXPLAIN plan capture.

Solution

Because SET syntax is fairly limited, we can detect leading SET statements with a regular expression. Certain other processes track queries with a leading "C-style" comment, so we trim the first of those as well.

For performance, we check the first word of the obfuscated SQL. If it is not SET, the regex is skipped entirely. If it is, we use the regex to trim a leading comment and one or more SET statements from the SQL.

Whatever remains is passed to EXPLAIN.

Out of Scope

  • The remaining SQL may still have more than one statement. This is not supported now, and the patch does not change that
  • Some SET statements may modify the behavior of PostgreSQL in a way that would be interesting to EXPLAIN (scan cost, for instance). Running them before the EXPLAIN is left as a later enhancement
  • Though the regex supports much of SET, dollar-delimited string literals are not supported. Pushing the logic to the lexer might help these problems in a more robust way

Motivation

DBMON-2626

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Copy link

codecov bot commented Apr 18, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.41%. Comparing base (85fca9e) to head (b7985e3).
Report is 7 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
activemq ?
cassandra ?
hive ?
hivemq ?
hudi ?
ignite ?
jboss_wildfly ?
kafka ?
postgres 93.29% <100.00%> (+18.26%) ⬆️
presto ?
solr ?

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jasonmp85 jasonmp85 force-pushed the fix-explain-with-set branch 2 times, most recently from 431c4d2 to 4dabdf0 Compare April 21, 2025 15:30
@jasonmp85 jasonmp85 force-pushed the fix-explain-with-set branch from 4dabdf0 to 77d7577 Compare April 22, 2025 19:33
@jasonmp85 jasonmp85 requested a review from dujuku April 22, 2025 20:08
@jasonmp85 jasonmp85 force-pushed the fix-explain-with-set branch from 77d7577 to f37b43b Compare April 22, 2025 20:27
Copy link
Contributor

@lu-zhengda lu-zhengda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trim leading SET statement would make it pass _can_explain_statement. Can you add an integration test to make sure explain will work with multi statements?
If it's not feasible to do it with integration test and you tested it manually, can you add a screenshot instead?

@jasonmp85
Copy link
Contributor Author

Trim leading SET statement would make it pass _can_explain_statement. Can you add an integration test to make sure explain will work with multi statements?

This doesn't modify the behavior of multi-statements that don't have SET prefixes, for instance, an INSERT followed by a SELECT. This behavior should be unmodified by my patch, no?

@lu-zhengda
Copy link
Contributor

Trim leading SET statement would make it pass _can_explain_statement. Can you add an integration test to make sure explain will work with multi statements?

This doesn't modify the behavior of multi-statements that don't have SET prefixes, for instance, an INSERT followed by a SELECT. This behavior should be unmodified by my patch, no?

what i mean is it is unknown to me if the integration is able to explain a multi-statement or not. Before the PR it fails quickly before the integration tries to explain it. With the change, statement like below will go through explaining.

SET LOCAL blah = 'x';
SELECT * from sometable;

Do we know if the explain will success or not?

@jasonmp85
Copy link
Contributor Author

Trim leading SET statement would make it pass _can_explain_statement. Can you add an integration test to make sure explain will work with multi statements?

Adding the integration test.

@jasonmp85 jasonmp85 force-pushed the fix-explain-with-set branch from 0e89836 to 269f207 Compare April 23, 2025 20:36
SQL strings containing multiple statements cause problems with the ex-
isting explain logic. This change adds a regex to strip an arbitrary
number of `SET` commands from the head of all incoming SQL, such that
the eventual `EXPLAIN` will be performed on just the trailing SQL.

This doesn't fully address all multi-statement SQL strings: in part-
icular clients might send e.g. both a SELECT and an UPDATE in the same
string. As before, this would be passed as-is to the EXPLAIN machinery.

Refs: DBMON-2626
@jasonmp85 jasonmp85 force-pushed the fix-explain-with-set branch from 022ce6a to a14adb8 Compare April 24, 2025 15:58
@jasonmp85
Copy link
Contributor Author

Ok, @lu-zhengda, added the failfast statement and an integration test to show the EXPLAIN plan working.

I realized my assumption that the obfuscated_statement was used for execution was incorrect (it was used in a test and passed around to various functions and I missed that it wasn't what was being run).

Because of the whitespace/comment issue you mentioned, I can't get around trimming the obfuscated SQL (for the "is this supported?" check), but then also need to trim the original statement.

Because parsing comments is going to be a lot more tedious than SET statements, this means this change won't support SET-prefixed SQL which contains comments. I think that still makes this a strict upgrade over the existing behavior, though.

@lu-zhengda
Copy link
Contributor

@jasonmp85 regarding

Because of the whitespace/comment issue you mentioned, I can't get around trimming the obfuscated SQL (for the "is this supported?" check), but then also need to trim the original statement.

Does this mean the integration is not able to explain SET LOCAL datestyle TO postgres; SELECT * FROM pg_class? It has to trim ``SET LOCAL datestyle TO postgres;` from the original statement before running explain.

@jasonmp85
Copy link
Contributor Author

Does this mean the integration is not able to explain SET LOCAL datestyle TO postgres; SELECT * FROM pg_class?

No, it can do that (see that's exactly the test I added).

I just mean if the original statement were something like /* comment */ LOCAL datestyle TO postgres; SELECT * FROM pg_class it would not work. Comments won't be supported in this case.

@jasonmp85
Copy link
Contributor Author

I've updated the description of this PR to capture discussions and scope creep/scope creep pushback. Read it to see what this PR covers and what it won't.

@jasonmp85 jasonmp85 changed the title feat: allow EXPLAIN on multi-statement SQL beginning with SET fix: allow EXPLAIN on multi-statement SQL beginning with SET Apr 24, 2025
orig_statement = statement

# remove leading SET statements from our SQL
if obfuscated_statement[:3].lower() == "set":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick because I'm waiting for this rapid deploy:

feels like having trim_leading_set_stmts be a no-op if it doesn't start with set and then removing the conditional could be nice

also python strings have a startswith() method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(feel free to ignore this, this is just the most context switching I can do between builds)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conditional was at the behest of @lu-zhengda to short-circuit for performance reasons (apparently agent environments vary a lot and performance is a concern). I suppose the cost of a function invocation isn't too bad but given the purpose of the conditional, ehhhhh idk.

Thanks for the startswith pointer, my scripting languages of choice have always been Ruby and Perl :-P. Won't help with the case issue here, unless I incur the cost of lower on the entire string…

Copy link
Contributor

@lu-zhengda lu-zhengda Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't help with the case issue here, unless I incur the cost of lower on the entire string…

💯

@lu-zhengda
Copy link
Contributor

LGTM

@jasonmp85 jasonmp85 added this pull request to the merge queue Apr 25, 2025
Merged via the queue into master with commit 9b43811 Apr 25, 2025
35 checks passed
@jasonmp85 jasonmp85 deleted the fix-explain-with-set branch April 25, 2025 17:20
jasonmp85 added a commit that referenced this pull request May 13, 2025
@jasonmp85 jasonmp85 mentioned this pull request May 16, 2025
3 tasks
github-merge-queue bot pushed a commit that referenced this pull request May 28, 2025
* Reapply "fix: allow EXPLAIN on multi-statement SQL beginning with SET (#20106)" (#20282)

This reverts commit 6486c7f.

* Address catastrophic backtracking in EXPLAIN regex

Took a bit to pin down why this was happening, but with a fuzzer and
some articles about exponential regex performance, I could address the
underlying issue without changing the code much.

* Fix changelog

* Formatting fixes

* Pass query_signature down into explain_statement

No need to recalculate this, and passing it down will allow trimmed SQL
to reuse the original query's signature.
domalessi pushed a commit that referenced this pull request May 29, 2025
* Reapply "fix: allow EXPLAIN on multi-statement SQL beginning with SET (#20106)" (#20282)

This reverts commit 6486c7f.

* Address catastrophic backtracking in EXPLAIN regex

Took a bit to pin down why this was happening, but with a fuzzer and
some articles about exponential regex performance, I could address the
underlying issue without changing the code much.

* Fix changelog

* Formatting fixes

* Pass query_signature down into explain_statement

No need to recalculate this, and passing it down will allow trimmed SQL
to reuse the original query's signature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants