-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix: allow EXPLAIN on multi-statement SQL beginning with SET #20106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
431c4d2
to
4dabdf0
Compare
4dabdf0
to
77d7577
Compare
77d7577
to
f37b43b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trim leading SET
statement would make it pass _can_explain_statement
. Can you add an integration test to make sure explain will work with multi statements?
If it's not feasible to do it with integration test and you tested it manually, can you add a screenshot instead?
This doesn't modify the behavior of multi-statements that don't have |
what i mean is it is unknown to me if the integration is able to explain a multi-statement or not. Before the PR it fails quickly before the integration tries to explain it. With the change, statement like below will go through explaining. SET LOCAL blah = 'x';
SELECT * from sometable; Do we know if the explain will success or not? |
Adding the integration test. |
0e89836
to
269f207
Compare
SQL strings containing multiple statements cause problems with the ex- isting explain logic. This change adds a regex to strip an arbitrary number of `SET` commands from the head of all incoming SQL, such that the eventual `EXPLAIN` will be performed on just the trailing SQL. This doesn't fully address all multi-statement SQL strings: in part- icular clients might send e.g. both a SELECT and an UPDATE in the same string. As before, this would be passed as-is to the EXPLAIN machinery. Refs: DBMON-2626
022ce6a
to
a14adb8
Compare
Ok, @lu-zhengda, added the failfast statement and an integration test to show the I realized my assumption that the Because of the whitespace/comment issue you mentioned, I can't get around trimming the obfuscated SQL (for the "is this supported?" check), but then also need to trim the original statement. Because parsing comments is going to be a lot more tedious than |
@jasonmp85 regarding
Does this mean the integration is not able to explain |
No, it can do that (see that's exactly the test I added). I just mean if the original statement were something like |
I've updated the description of this PR to capture discussions and scope creep/scope creep pushback. Read it to see what this PR covers and what it won't. |
orig_statement = statement | ||
|
||
# remove leading SET statements from our SQL | ||
if obfuscated_statement[:3].lower() == "set": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit-pick because I'm waiting for this rapid deploy:
feels like having trim_leading_set_stmts
be a no-op if it doesn't start with set
and then removing the conditional could be nice
also python strings have a startswith()
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(feel free to ignore this, this is just the most context switching I can do between builds)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conditional was at the behest of @lu-zhengda to short-circuit for performance reasons (apparently agent environments vary a lot and performance is a concern). I suppose the cost of a function invocation isn't too bad but given the purpose of the conditional, ehhhhh idk.
Thanks for the startswith
pointer, my scripting languages of choice have always been Ruby and Perl :-P. Won't help with the case issue here, unless I incur the cost of lower
on the entire string…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't help with the case issue here, unless I incur the cost of lower on the entire string…
💯
LGTM |
* Reapply "fix: allow EXPLAIN on multi-statement SQL beginning with SET (#20106)" (#20282) This reverts commit 6486c7f. * Address catastrophic backtracking in EXPLAIN regex Took a bit to pin down why this was happening, but with a fuzzer and some articles about exponential regex performance, I could address the underlying issue without changing the code much. * Fix changelog * Formatting fixes * Pass query_signature down into explain_statement No need to recalculate this, and passing it down will allow trimmed SQL to reuse the original query's signature.
* Reapply "fix: allow EXPLAIN on multi-statement SQL beginning with SET (#20106)" (#20282) This reverts commit 6486c7f. * Address catastrophic backtracking in EXPLAIN regex Took a bit to pin down why this was happening, but with a fuzzer and some articles about exponential regex performance, I could address the underlying issue without changing the code much. * Fix changelog * Formatting fixes * Pass query_signature down into explain_statement No need to recalculate this, and passing it down will allow trimmed SQL to reuse the original query's signature.
What does this PR do?
Problem
Some users prepend one or more
SET
statements when issuing SQL to PostgreSQL, usually for auditing, tracking, or tracing of some kind. These are issued by the client as a single string, triggering the multiple statements in a single query behavior.However, this entire string will appear in the
pg_stat_activity
view, which is where the agent grabs queries for periodic examination withEXPLAIN
.EXPLAIN
's syntax does not support multiple statements, and as such users of the above pattern miss out onEXPLAIN
plan capture.Solution
Because
SET
syntax is fairly limited, we can detect leadingSET
statements with a regular expression. Certain other processes track queries with a leading "C-style" comment, so we trim the first of those as well.For performance, we check the first word of the obfuscated SQL. If it is not
SET
, the regex is skipped entirely. If it is, we use the regex to trim a leading comment and one or moreSET
statements from the SQL.Whatever remains is passed to
EXPLAIN
.Out of Scope
SET
statements may modify the behavior of PostgreSQL in a way that would be interesting toEXPLAIN
(scan cost, for instance). Running them before theEXPLAIN
is left as a later enhancementSET
, dollar-delimited string literals are not supported. Pushing the logic to the lexer might help these problems in a more robust wayMotivation
DBMON-2626
Review checklist (to be filled by reviewers)
qa/skip-qa
label if the PR doesn't need to be tested during QA.backport/<branch-name>
label to the PR and it will automatically open a backport PR once this one is merged