Commit 84c2ad0
Add merge join support
Summary:
This diff implements merge join as a new physical join algorithm in the Axiom optimizer. Merge join is a highly efficient join method for pre-sorted inputs, avoiding the hash table construction overhead of hash joins. When both join inputs are sorted on the join keys (or can be cheaply sorted), merge join streams through both inputs in lockstep, matching rows with equal keys. The applicability is limited to tables which are already copartitioned and where the join keys contain all the bucketing keys. For Hive partitioned tables, both sides must have a single hive partition or must have all hive partitioning columns on both sides in the join keys.
Main changes:
1. **Added joinByMerge() method** (Optimization.cpp, ~300 lines):
- Core merge join candidate generation logic
- Validates merge join preconditions:
- Checks all join keys are columns (not expressions) - merge join requires direct column comparisons
- Verifies left input has both partitioning and ordering (from distribution)
- Confirms ordering is ascending (merge join requires monotonic order)
- Validates all partition columns are in join keys (ensures copartitioning)
- Identifies merge columns from left input's orderKeys that match join keys
- Plans right side with matching partition and ordering distributions
- Adds shuffle and sort operators to right side if needed
- Computes join using merge method
- Returns NextJoin candidate for cost comparison
2. **Merge join precondition checking**:
- **Column-only keys**: Rejects if any join key is an expression (e.g., `CAST(orderkey AS BIGINT)`)
- **Partitioning requirement**: Left input must be partitioned (not gathered)
- **Ordering requirement**: Left input must have orderKeys specified
- **Ascending order**: Only kAscNullsFirst and kAscNullsLast supported
- **Partition subset**: All partition columns must appear in join keys
- **Matching merge columns**: At least one orderKey must match a join key
3. **Right side preparation algorithm**:
- Constructs Distribution for right side matching left's partition and ordering
- For each left partition column, finds corresponding right join key
- For each left merge column (from orderKeys), finds corresponding right join key
- Creates `forRight` distribution with:
- Same DistributionType as left (enables copartitioning)
- rightPartition: right keys corresponding to left partition columns
- rightOrderKeys: right keys corresponding to left merge columns
- rightOrderTypes: ascending order (kAscNullsLast) to match left
- Calls `makePlan()` with forRight distribution
- If `needsShuffle` is true, adds Repartition operator
- Checks if right input needs sorting (orderKeys don't match expected)
- Adds OrderBy operator if needed, after shuffle or directly
4. **Merge join cost model** (RelationOp.cpp):
- Added `setMergeJoinCost()` method to Join class
- Cost formula: `3 * kKeyCompareCost * numKeys * min(1, fanout) + rightSideBytes + kHashExtractColumnCost * numRightSideColumns`
- Rationale:
- Key comparisons: Merge join compares keys 3 times on average per match (binary search in merge)
- Scales with number of keys and fanout (more comparisons for multiple matches)
- Data copying: Transfers right side bytes to output
- Column extraction: Extracts columns from right side vectors
- Significantly cheaper than hash join for large inputs (no hash table construction)
- Cost difference grows with build side size (hash table cost is O(n log n), merge is O(n))
5. **Integration with join planning** (Optimization.cpp):
- Modified `makeJoins()` to call `joinByMerge()` after `joinByIndex()`
- Added `testingUseMergeJoin` option for testing:
- `std::nullopt` (default): Normal cost-based selection among all join types
- `true`: Prefer merge join - return immediately if joinByMerge produces a candidate
- `false`: Disable merge join - skip calling joinByMerge entirely
- If testing mode is off, merge join competes with hash join based on cost
- If testing mode is on and merge join produced a candidate, skip hash join consideration
6. **Schema changes for lookup keys** (Schema.h, Schema.cpp):
- Added `lookupColumns` field to ColumnGroup
- Distinguished from `orderKeys` in Distribution:
- `lookupColumns`: Columns used for index lookups (prefix of sort order)
- `orderKeys`: Full sort order (may include additional sorting columns)
- The key point is that sortedness does not in and of itself make a table lookup-compatible.
- Modified `addIndex()` to accept both `columns` and `lookupColumns`
- Updated `indexLookupCardinality()` to use lookupColumns for cardinality estimation
- Enables accurate modeling of sorted table access patterns
- Example: Table sorted on (orderkey, linenum) can be efficiently joined on just orderkey
7. **Velox plan translation** (ToVelox.cpp, ToVelox.h):
- Added `makeMergeJoin()` method to create MergeJoinNode
- Checks `join.method == JoinMethod::kMerge` to dispatch to merge join creation
- Creates `velox::core::MergeJoinNode` with:
- Join type (INNER, LEFT, RIGHT, FULL, SEMI, ANTI)
- Left and right keys as field references
- Filter expression (for non-equi join conditions)
- Left and right child plan nodes
- Output type from join columns
- Registers prediction and history for cost feedback
- MergeJoinNode relies on Velox runtime's merge join operator
8. **Bucketed sorted table creation** (ParquetTpchTest.cpp):
- Added `makeBucketedSortedTables()` utility method
- Creates `orders_bs`, `lineitem_bs`, `partsupp_bs`, `part_bs` tables
- Uses 32 buckets (more buckets than `_b` versions for finer parallelism)
- Specifies `sorted_by` property in addition to `bucketed_by`
- Example: `orders_bs` is bucketed on `o_orderkey` and sorted on `o_orderkey` within each bucket
- Parquet files maintain sort order within partitions
- Used for testing merge join on realistic data
9. **Plan matcher support** (PlanMatcher.cpp, PlanMatcherGenerator.cpp):
- Added `mergeJoin()` method to PlanMatcherBuilder
- Signature: `mergeJoin(matcher, joinType)` similar to `hashJoin()`
- Enables test assertions like:
```cpp
auto matcher = PlanMatcherBuilder()
.tableScan("orders_bs")
.mergeJoin(rightMatcher, JoinType::kInner)
.build();
```
- Added merge join code generation in PlanMatcherGenerator
- Generates proper `.mergeJoin()` calls when plan contains MergeJoinNode
10. **Testing infrastructure** (OptimizerOptions.h):
- Added `testingUseMergeJoin` optional flag
- Three modes for comprehensive testing:
- `nullopt`: Production mode - cost-based selection
- `true`: Force merge join - tests merge join implementation in isolation
- `false`: Disable merge join - tests that hash join fallback works
- Enables differential testing: run same query with and without merge join
The merge join selection algorithm in joinByMerge():
```
1. Check preconditions:
- All join keys are columns
- Left input partitioned and ordered
- Order is ascending
- Partition columns ⊆ join keys
2. Extract merge columns:
- For each left orderKey that matches a join key
- Build leftMergeColumns vector
3. Plan right side:
- Construct matching Distribution (partition + order)
- Call makePlan() to get right input plan
- Check if shuffle/sort needed via needsShuffle flag
4. Add shuffle/sort if needed:
- If needsShuffle:
- Add Repartition on rightPartition
- Add OrderBy on rightOrderKeys
- Else if ordering doesn't match:
- Add OrderBy on rightOrderKeys
5. Create Join operator:
- method = JoinMethod::kMerge
- Compute cost using setMergeJoinCost()
- Return as NextJoin candidate
```
Differential Revision: D898753371 parent d1a58be commit 84c2ad0
File tree
18 files changed
+1351
-92
lines changed- axiom
- optimizer
- tests
- runner
18 files changed
+1351
-92
lines changedLarge diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
313 | 313 | | |
314 | 314 | | |
315 | 315 | | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
316 | 326 | | |
317 | 327 | | |
318 | 328 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
78 | 85 | | |
79 | 86 | | |
80 | 87 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
403 | 403 | | |
404 | 404 | | |
405 | 405 | | |
406 | | - | |
| 406 | + | |
| 407 | + | |
407 | 408 | | |
408 | 409 | | |
409 | 410 | | |
410 | 411 | | |
411 | 412 | | |
412 | 413 | | |
413 | | - | |
| 414 | + | |
| 415 | + | |
414 | 416 | | |
415 | 417 | | |
416 | 418 | | |
| |||
471 | 473 | | |
472 | 474 | | |
473 | 475 | | |
474 | | - | |
475 | | - | |
476 | | - | |
477 | | - | |
478 | | - | |
479 | | - | |
480 | | - | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
481 | 488 | | |
482 | | - | |
483 | | - | |
| 489 | + | |
| 490 | + | |
484 | 491 | | |
485 | | - | |
| 492 | + | |
| 493 | + | |
486 | 494 | | |
487 | 495 | | |
488 | 496 | | |
| |||
532 | 540 | | |
533 | 541 | | |
534 | 542 | | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
535 | 560 | | |
536 | 561 | | |
537 | 562 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
491 | 491 | | |
492 | 492 | | |
493 | 493 | | |
494 | | - | |
| 494 | + | |
| 495 | + | |
495 | 496 | | |
496 | 497 | | |
497 | 498 | | |
| |||
505 | 506 | | |
506 | 507 | | |
507 | 508 | | |
| 509 | + | |
508 | 510 | | |
509 | 511 | | |
510 | 512 | | |
| |||
513 | 515 | | |
514 | 516 | | |
515 | 517 | | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
516 | 521 | | |
517 | 522 | | |
518 | 523 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
104 | | - | |
| 104 | + | |
| 105 | + | |
105 | 106 | | |
106 | 107 | | |
107 | | - | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
108 | 113 | | |
109 | 114 | | |
110 | 115 | | |
| |||
187 | 192 | | |
188 | 193 | | |
189 | 194 | | |
190 | | - | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
191 | 204 | | |
192 | 205 | | |
193 | 206 | | |
| |||
281 | 294 | | |
282 | 295 | | |
283 | 296 | | |
284 | | - | |
| 297 | + | |
285 | 298 | | |
286 | 299 | | |
287 | 300 | | |
288 | | - | |
289 | | - | |
290 | | - | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
291 | 305 | | |
292 | 306 | | |
293 | 307 | | |
294 | 308 | | |
295 | 309 | | |
296 | | - | |
| 310 | + | |
297 | 311 | | |
298 | | - | |
| 312 | + | |
299 | 313 | | |
300 | 314 | | |
301 | 315 | | |
302 | 316 | | |
303 | | - | |
| 317 | + | |
304 | 318 | | |
305 | 319 | | |
306 | 320 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
231 | 231 | | |
232 | 232 | | |
233 | 233 | | |
234 | | - | |
| 234 | + | |
| 235 | + | |
235 | 236 | | |
236 | 237 | | |
237 | 238 | | |
238 | | - | |
| 239 | + | |
| 240 | + | |
239 | 241 | | |
240 | 242 | | |
241 | 243 | | |
242 | 244 | | |
243 | 245 | | |
| 246 | + | |
244 | 247 | | |
245 | 248 | | |
246 | 249 | | |
| |||
302 | 305 | | |
303 | 306 | | |
304 | 307 | | |
305 | | - | |
| 308 | + | |
| 309 | + | |
306 | 310 | | |
307 | 311 | | |
308 | 312 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1112 | 1112 | | |
1113 | 1113 | | |
1114 | 1114 | | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
1115 | 1119 | | |
1116 | 1120 | | |
1117 | 1121 | | |
| |||
1130 | 1134 | | |
1131 | 1135 | | |
1132 | 1136 | | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
| 1145 | + | |
| 1146 | + | |
| 1147 | + | |
| 1148 | + | |
| 1149 | + | |
| 1150 | + | |
| 1151 | + | |
| 1152 | + | |
| 1153 | + | |
| 1154 | + | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
1133 | 1160 | | |
1134 | 1161 | | |
1135 | 1162 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
179 | 179 | | |
180 | 180 | | |
181 | 181 | | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
182 | 189 | | |
183 | 190 | | |
184 | 191 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
| 123 | + | |
123 | 124 | | |
124 | 125 | | |
125 | 126 | | |
| |||
0 commit comments