-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Unexpected row deduplication using eliminate_full_outer_join #4178
Fix: Unexpected row deduplication using eliminate_full_outer_join #4178
Conversation
… with union all. This should be more in line with the behaviour of FULL OUTER JOIN since it keeps duplicate rows that would be generated when the joined table has many entries for the same key(s). https://en.wikipedia.org/wiki/Join_%28SQL%29#Full_outer_join
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of comments. I'd prefer a simpler way to implement this, if possible.
I agree, I am not sure however if the code is simpler, it would require something like this to identify cols from the left side table
and then
Let me know what you think |
@liaco I see, your solution is fine, we'll keep iterating on that. |
@georgesittas I implemented your feedback, I don't think is possible to simplify it more due to the nature of full joins: to emulate the functionality we need to use unions or interceptions of available joins (inner, left) to achieve a partial result and then use anti joins to fill in the gaps, however to use anti-joins we need necessarily to identify joining conditions first. One other way is building a CTE to select from with the union of common joining identifiers and simply left join the two tables (something like this), but it involves more complexity. Let me know what you think |
Thanks a lot for the PR @liaco, great work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final (minor) comment, thank you for the PR & for the quick iterations.
Thank you for your thoughtful reviews, happy to give my small contribute |
Check out 7ecd519 |
Appreciate the contribution and the rapid iterations! :-) |
The union syntax cause deduplication of rows from source tables, to achieve the intended result but avoid unintended deduplication I changed the logic to a left join + an anti join(right join + not exists) with union all.
This should be more in line with the behaviour of FULL OUTER JOINs since it keeps duplicate rows that would be present if there are duplicate rows in the two joined tables.
https://en.wikipedia.org/wiki/Join_%28SQL%29#Full_outer_join
Other possible ways to achieve this are: