Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go: Fix bad join order in comparesFirstCharacter #18826

Merged
merged 2 commits into from
Feb 21, 2025

Conversation

owen-mc
Copy link
Contributor

@owen-mc owen-mc commented Feb 20, 2025

Old join order (on uber/aresdb):

[2025-02-20 12:19:36] Evaluated non-recursive predicate StringOps::StringOps::HasPrefix::comparesFirstCharacter/3#24aa5857@5bcd38hd in 3ms (size: 1).
Evaluated relational algebra for predicate StringOps::StringOps::HasPrefix::comparesFirstCharacter/3#24aa5857@5bcd38hd with tuple counts:
           6528  ~0%    {3} r1 = JOIN project#DataFlowUtil::EqualityTestNode#9fb3a74a#2 WITH `DataFlowUtil::BinaryOperationNode.hasOperands/2#dispred#c32311b8` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Rhs.2
           6562  ~4%    {3}    | JOIN WITH `GlobalValueNumbering::globalValueNumber/1#39395ec1` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2
        1766593  ~1%    {3}    | JOIN WITH `GlobalValueNumbering::globalValueNumber/1#39395ec1_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2
             89  ~0%    {5}    | JOIN WITH `DataFlowUtil::ElementReadNode.getIndex/0#dispred#9e685477` ON FIRST 1 OUTPUT Rhs.1, _, Lhs.1, Lhs.2, Lhs.0
             89  ~0%    {5}    | REWRITE WITH Out.1 := 0
             17  ~0%    {3}    | JOIN WITH `DataFlowUtil::Node.getIntValue/0#dispred#95c22b0f#fb` ON FIRST 2 OUTPUT Lhs.4, Lhs.2, Lhs.3
             17  ~0%    {3}    | JOIN WITH `DataFlowUtil::ComponentReadNode.getBase/0#dispred#0757f222` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2
             17  ~0%    {4}    | JOIN WITH `DataFlowUtil::InstructionNode.getType/0#dispred#98c46cd8` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.0
             17  ~0%    {4}    | JOIN WITH `Types::Type.getUnderlyingType/0#dispred#e61d6b4a` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3
              1  ~0%    {3}    | JOIN WITH @stringtype ON FIRST 1 OUTPUT Lhs.1, Lhs.3, Lhs.2
                        return r1

After the change:

[2025-02-20 14:46:40] Evaluated non-recursive predicate StringOps::StringOps::HasPrefix::comparesFirstCharacter/3#24aa5857@6a98c74d in 0ms (size: 1).
Evaluated relational algebra for predicate StringOps::StringOps::HasPrefix::comparesFirstCharacter/3#24aa5857@6a98c74d with tuple counts:
        1383  ~9%    {3} r1 = SCAN `DataFlowUtil::ElementReadNode.getIndex/0#dispred#9e685477` OUTPUT In.1, _, In.0
        1383  ~0%    {3}    | REWRITE WITH Out.1 := 0
         318  ~0%    {1}    | JOIN WITH `DataFlowUtil::Node.getIntValue/0#dispred#95c22b0f#fb` ON FIRST 2 OUTPUT Lhs.2
         318  ~0%    {2}    | JOIN WITH `DataFlowUtil::ComponentReadNode.getBase/0#dispred#0757f222` ON FIRST 1 OUTPUT Lhs.0, Rhs.1
         318  ~1%    {2}    | JOIN WITH `GlobalValueNumbering::globalValueNumber/1#39395ec1` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
         669  ~0%    {2}    | JOIN WITH `GlobalValueNumbering::globalValueNumber/1#39395ec1_10#join_rhs` ON FIRST 1 OUTPUT Lhs.1, Rhs.1
         669  ~0%    {3}    | JOIN WITH `DataFlowUtil::InstructionNode.getType/0#dispred#98c46cd8` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1
         669  ~0%    {3}    | JOIN WITH `Types::Type.getUnderlyingType/0#dispred#e61d6b4a` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2
           1  ~0%    {2}    | JOIN WITH @stringtype ON FIRST 1 OUTPUT Lhs.2, Lhs.1
           1  ~0%    {3}    | JOIN WITH `DataFlowUtil::BinaryOperationNode.hasOperands/2#dispred#c32311b8_102#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Rhs.2
           1  ~0%    {3}    | JOIN WITH project#DataFlowUtil::EqualityTestNode#9fb3a74a#2 ON FIRST 1 OUTPUT Lhs.0, Lhs.1, Lhs.2
                     return r1

@owen-mc owen-mc added acknowledged GitHub staff acknowledges this issue no-change-note-required This PR does not need a change note labels Feb 20, 2025
@owen-mc owen-mc requested review from smowton and Copilot February 20, 2025 14:52
@owen-mc owen-mc requested a review from a team as a code owner February 20, 2025 14:52

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Tip: If you use Visual Studio Code, you can request a review from Copilot before you push from the "Source Control" tab. Learn more

@github-actions github-actions bot added the Go label Feb 20, 2025
@owen-mc owen-mc removed the acknowledged GitHub staff acknowledges this issue label Feb 20, 2025
@owen-mc
Copy link
Contributor Author

owen-mc commented Feb 20, 2025

DCA shows it fixing the bad join order on all the repos 🎉 .

@smowton
Copy link
Contributor

smowton commented Feb 20, 2025

You may as well go one better: right now it's saying, "take all the e[0] statements, find all their gvn-aliases, then check if e is a string, and finally check this is an equality-test LHS". You might as well move the "check e is a string" bit before the inherently-quadratic map through GVN, which I think would be achieved by only tagging the read in the context of the GVN lookup, or perhaps it would be better to kick the "find e[0] such that e is a string" out into an inline-late predicate.

@owen-mc
Copy link
Contributor Author

owen-mc commented Feb 20, 2025

perhaps it would be better to kick the "find e[0] such that e is a string" out into an inline-late predicate.

Surely that's the part you want to do early.

I did try only putting the binding pragma in the GVN part. It had a slightly worse join order because it choose to do "reads of strings" first. But you may be right that it protects us better from the worst case.

@smowton
Copy link
Contributor

smowton commented Feb 20, 2025

Yeah I'd say that's better, in that it's linear in the size of the database -- ideally we should restrict the pool as small as possible before going through GVN, because that's where the quadratic term goes in.

@owen-mc
Copy link
Contributor Author

owen-mc commented Feb 21, 2025

The docs on join orders say that there isn't any way to use pragmas to make a predicate go last, so I pulled what we want out as a pragma[noinline] predicate and made that go first. Within that I made it start with all reads of an index rather than all string-typed nodes as that produced fewer tuples on the db I'm testing on, and I intuitively it feels better in general. DCA looks good.

@aschackmull
Copy link
Contributor

The docs on join orders say that there isn't any way to use pragmas to make a predicate go last

Amended: https://github.com/github/codeql-team/pull/3785

@owen-mc
Copy link
Contributor Author

owen-mc commented Feb 21, 2025

@aschackmull Would inline_late help here? There are three predicates that reference read and I thought that if I put the one I want to come last in an inline_late predicate then it could still be be ordered in the middle, because read has been bound.

@aschackmull
Copy link
Contributor

@aschackmull Would inline_late help here?

I haven't looked into this particular case - I was just triggered to amend the doc based on your statement.

There are three predicates that reference read and I thought that if I put the one I want to come last in an inline_late predicate then it could still be be ordered in the middle, because read has been bound.

True. What you did in this particular case looks reasonable to me. (Although the RA you posted as the "after" version above doesn't look right - you have two predicates instead of one now, so I'd expect two pipelines in the "after" version).

@owen-mc
Copy link
Contributor Author

owen-mc commented Feb 21, 2025

Yes, that RA is out of date. With the latest version of the code that uses a separate noinline predicate it is:

[2025-02-21 00:58:27] Evaluated non-recursive predicate StringOps::StringOps::HasPrefix::getReadOfFirstChar/1#c1b85cb9@66e653ef in 0ms (size: 1).
Evaluated relational algebra for predicate StringOps::StringOps::HasPrefix::getReadOfFirstChar/1#c1b85cb9@66e653ef with tuple counts:
        1383  ~0%    {3} r1 = SCAN `DataFlowUtil::ElementReadNode.getIndex/0#dispred#9e685477` OUTPUT In.1, _, In.0
        1383  ~0%    {3}    | REWRITE WITH Out.1 := 0
         318  ~0%    {1}    | JOIN WITH `DataFlowUtil::Node.getIntValue/0#dispred#95c22b0f#fb` ON FIRST 2 OUTPUT Lhs.2
         318  ~0%    {2}    | JOIN WITH `DataFlowUtil::ComponentReadNode.getBase/0#dispred#0757f222` ON FIRST 1 OUTPUT Rhs.1, Lhs.0
         318  ~0%    {3}    | JOIN WITH `DataFlowUtil::InstructionNode.getType/0#dispred#98c46cd8` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0
         318  ~0%    {3}    | JOIN WITH `Types::Type.getUnderlyingType/0#dispred#e61d6b4a` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2
           1  ~0%    {2}    | JOIN WITH @stringtype ON FIRST 1 OUTPUT Lhs.2, Lhs.1
                     return r1

[2025-02-21 00:58:28] Evaluated non-recursive predicate StringOps::StringOps::HasPrefix::comparesFirstCharacter/3#24aa5857@7a72d6t5 in 0ms (size: 1).
Evaluated relational algebra for predicate StringOps::StringOps::HasPrefix::comparesFirstCharacter/3#24aa5857@7a72d6t5 with tuple counts:
        1  ~0%    {2} r1 = SCAN `StringOps::StringOps::HasPrefix::getReadOfFirstChar/1#c1b85cb9` OUTPUT In.1, In.0
        1  ~0%    {2}    | JOIN WITH `GlobalValueNumbering::globalValueNumber/1#39395ec1` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
        1  ~0%    {2}    | JOIN WITH `GlobalValueNumbering::globalValueNumber/1#39395ec1_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
        1  ~0%    {3}    | JOIN WITH `DataFlowUtil::BinaryOperationNode.hasOperands/2#dispred#c32311b8_102#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Rhs.2
        1  ~0%    {3}    | JOIN WITH project#DataFlowUtil::EqualityTestNode#9fb3a74a#2 ON FIRST 1 OUTPUT Lhs.0, Lhs.1, Lhs.2
                  return r1

@owen-mc owen-mc merged commit 721b8c4 into github:main Feb 21, 2025
13 checks passed
@owen-mc owen-mc deleted the go/join-order-fixes branch February 21, 2025 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Go no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants