partially implement FULL OUTER + LEFT and RIGHT hash joins#5096
Open
PThorpe92 wants to merge 33 commits intotursodatabase:mainfrom
Open
partially implement FULL OUTER + LEFT and RIGHT hash joins#5096PThorpe92 wants to merge 33 commits intotursodatabase:mainfrom
PThorpe92 wants to merge 33 commits intotursodatabase:mainfrom
Conversation
32ed10d to
21b45c0
Compare
Merging this PR will not alter performance
Comparing Footnotes
|
ba64aa5 to
36d9123
Compare
00da6b4 to
c195d45
Compare
…tead of skipping them
87094fc to
52e9a65
Compare
52e9a65 to
dedb1c9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Implements LEFT OUTER, RIGHT, and FULL OUTER hash joins. Previously, hash joins only supported inner join semantics; any outer join fell back to nested-loop. This enables hash join acceleration for the most common outer join patterns.
RIGHT JOIN is implemented as table swap/rewrite + LEFT JOIN semantics. FULL OUTER sets both
outerandfull_outerflags onJoinInfo.New opcodes:
HashMarkMatched,HashScanUnmatched,HashNextUnmatchedtrack which build-side entries matched during probing, then iterate unmatched entries for outer join NULL emission.track_matchedflag onHashTableConfigenablesmatched_bitsvectors (one bool per entry per bucket). Entries with NULL keys are now kept whentrack_matchedis set so they appear in the unmatched scan. Matched bits survive partition eviction/reload for spilled tables.Optimizer:
HashJoinTypeenum (Inner,LeftOuter,FullOuter) propagated through access method selection.join_info.outerare rejected to prevent incorrect matches when cursors are in NullRow mode.Why we need subroutines for chained outer hash joins:
When a FULL OUTER (or LEFT OUTER) hash join is followed by additional inner table loops in a multi-way join chain, the unmatched emission paths - both the unmatched build scan and the unmatched probe path - need to re-enter those subsequent inner loops to produce correct results. For example, in
t1 FULL OUTER JOIN t2 ON ... LEFT JOIN t3 ON ..., when we emit an unmatched t2 row/no match in t1, we still need to scan t3 to find any rows that join with t2. Without the GoSub wrapper, the inner loop cursors (e.g. t3's cursor) may not be open or rewound at the point where unmatched rows are emitted, because the unmatched scan happens after the main probe loop has finished. The GoSub/Return pattern solves this by wrapping the inner table loops in a subroutine: during normal execution the main loop calls GoSub to enter the subroutine, and when the unmatched scan later needs to emit a row, it can call the same GoSub to re-enter the inner loops from scratch - rewinding cursors, evaluating join conditions, and dispatching through the correct emit path (ORDER BY sorter, GROUP BY, aggregates, etc.) without duplicating any of that codegen. The Return instruction at the end of the subroutine jumps back to whichever call site invoked it, so the same subroutine body serves both the matched and unmatched paths.Limitations
A FULL OUTER B FULL OUTER C): rejected because the second join's build table hasjoin_info.outerfrom the first.Perf?
TPC-H- Query 13
Before:
12.58sAfter:
3.56sSqlite:
8.65s