Skip to content

Fix OuterJoinRemover so it does not optimize too aggressively#27

Open
dallmair wants to merge 1 commit intoterrajobst:mainfrom
dallmair:fix-outerjoinremover
Open

Fix OuterJoinRemover so it does not optimize too aggressively#27
dallmair wants to merge 1 commit intoterrajobst:mainfrom
dallmair:fix-outerjoinremover

Conversation

@dallmair
Copy link
Copy Markdown
Contributor

@dallmair dallmair commented Feb 15, 2019

Here's a query that exhibits different behavior:

SELECT  o.OrderID
FROM    Orders o
WHERE   (
            SELECT  TOP 1 od2.ShipVia
            FROM    Orders od2
            WHERE   od2.OrderID > o.OrderID
        ) IS NOT NULL

Previously, this failed with a KeyNotFoundException when evaluated,
because the OuterJoinRemover erroneously changed the left semi join to
an inner join.

Reasoning: The OuterJoinRemover checked for any null-rejecting column
anywhere in the query, but it needs to check for only those columns
that are null-rejecting in the respective subtree, i.e. children of the
join node.

Here's a query that exhibits different behavior:

SELECT  o.OrderID
FROM    Orders o
WHERE   (
            SELECT  TOP 1 od2.ShipVia
            FROM    Orders od2
            WHERE   od2.OrderID > o.OrderID
        ) IS NOT NULL

Previously, this failed with a KeyNotFoundException when evaluated,
because the OuterJoinRemover erroneously changed the left semi join to
an inner join.

Reasoning: The OuterJoinRemover checked for any null-rejecting column
anywhere in the query, but it needs to check for only those columns
that are null-rejecting in the respective subtree, i.e. children of the
join node.
@terrajobst
Copy link
Copy Markdown
Owner

terrajobst commented Mar 17, 2019

@dallmair

Thanks! The outer join definitely shouldn't be replaced by an inner join but there is a larger issue here: the code shouldn't crash with a key not found exception.

I think the problem is the join orderer which seems to mess up cases where outer references are included.

Outer references are only allowed to appear on the right hand side of a join (i.e. right can reference left, but not the other way around). However, the orderer swaps left and right in an illegal fashion, causing the outer reference to become unresolvable later.

I'm not entirely sure what the fix for this should be yet. I could either disallow join ordering entirely for joins with outer references or try to handle them when reassembling the join graph. Skipping joins with outer references might be the easiest fix but my concern is that it might stop optimizing joins in certain queries entirely. Alternatively I might ignore the only-left-can-reference-right constraint during optimization entirely and introduce a late phase that enforces it by swapping left and right when necessary.

@dallmair dallmair mentioned this pull request Mar 18, 2019
@terrajobst terrajobst force-pushed the master branch 5 times, most recently from 0bcf6eb to 12ca856 Compare November 18, 2019 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants