feat: support runtime filter in shuffle join #17952

SkyFan2002 · 2025-05-17T11:41:17Z

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Overview

This PR extends the runtime filter functionality to support shuffle join operations. Previously, runtime filters were only supported for broadcast join scenarios. This enhancement enables runtime filters to be utilized in distributed join operations where data is shuffled across nodes.

Key Changes

Enhanced JoinRuntimeFilter::build_runtime_filter to support runtime filter generation in shuffle join scenarios
Implemented broadcast_id generation mechanism for shuffle join build side, leveraging the channel infrastructure introduced in feat: support using fragment forest to execute additional broadcast operations. #17872
Added support for merging filter packets from multiple build nodes to ensure consistent filtering across the cluster
Refactored runtime filter related code to improve maintainability and extensibility

Example

Prepare table:

create or replace table test1(x int) row_per_block = 1;
create or replace table test2(x int) row_per_block = 1;
insert into test1 select * from numbers(100000);
insert into test2 select * from numbers(1000);

The EXPLAIN output shows that runtime filters are being applied in this join operation:

root@localhost:8000/tpch> explain select * from test1 join test2 on test1.x = test2.x;

explain
select
  *
from
  test1
  join test2 on test1.x = test2.x

-[ EXPLAIN ]-----------------------------------
Exchange
├── output columns: [test1.x (#0), test2.x (#1)]
├── exchange type: Merge
└── HashJoin
    ├── output columns: [test1.x (#0), test2.x (#1)]
    ├── join type: INNER
    ├── build keys: [test2.x (#1)]
    ├── probe keys: [test1.x (#0)]
    ├── keys is null equal: [false]
    ├── filters: []
    ├── build join filters:
    │   └── filter id:0, build key:test2.x (#1), probe key:test1.x (#0), filter type:inlist,min_max
    ├── estimated rows: 1000.00
    ├── Exchange(Build)
    │   ├── output columns: [test2.x (#1)]
    │   ├── exchange type: Broadcast
    │   └── TableScan
    │       ├── table: default.tpch.test2
    │       ├── output columns: [x (#1)]
    │       ├── read rows: 1000
    │       ├── read size: 35.16 KiB
    │       ├── partitions total: 1000
    │       ├── partitions scanned: 1000
    │       ├── pruning stats: [segments: <range pruning: 1 to 1>, blocks: <range pruning: 1000 to 1000>]
    │       ├── push downs: [filters: [], limit: NONE]
    │       └── estimated rows: 1000.00
    └── TableScan(Probe)
        ├── table: default.tpch.test1
        ├── output columns: [x (#0)]
        ├── read rows: 100000
        ├── read size: 3.43 MiB
        ├── partitions total: 100
        ├── partitions scanned: 100000
        ├── pruning stats: [segments: <range pruning: 100 to 100>, blocks: <range pruning: 100000 to 100000>]
        ├── push downs: [filters: [], limit: NONE]
        ├── apply join filters: [#0]
        └── estimated rows: 100000.00

37 rows explain in 0.232 sec. Processed 0 rows, 0 B (0 row/s, 0 B/s)

The log from three nodes in cluster shows that runtime filters are being applied effectively in this query:

81c9d95b-2e8d-45e6-b3fb-25578388ce98 2025-06-02T22:52:27.571807+08:00  INFO databend_common_storages_fuse::operations::read::parquet_data_transform_reader: parquet_data_transform_reader.rs:218 [RUNTIME-FILTER]ReadParquetDataTransform finished, scan_id: 0, blocks_total: 34000, blocks_pruned: 34000

81c9d95b-2e8d-45e6-b3fb-25578388ce98 2025-06-02T22:52:27.538115+08:00  INFO databend_common_storages_fuse::operations::read::parquet_data_transform_reader: parquet_data_transform_reader.rs:218 [RUNTIME-FILTER]ReadParquetDataTransform finished, scan_id: 0, blocks_total: 33000, blocks_pruned: 33000

81c9d95b-2e8d-45e6-b3fb-25578388ce98 2025-06-02T22:52:27.566167+08:00  INFO databend_common_storages_fuse::operations::read::parquet_data_transform_reader: parquet_data_transform_reader.rs:218 [RUNTIME-FILTER]ReadParquetDataTransform finished, scan_id: 0, blocks_total: 33000, blocks_pruned: 32000

Tests

Unit Test
Logic Test
Benchmark Test
No Test - Explain why

Type of change

Bug Fix (non-breaking change which fixes an issue)
New Feature (non-breaking change which adds functionality)
Breaking Change (fix or feature that could cause existing functionality not to work as expected)
Documentation Update
Refactoring
Performance Improvement
Other (please describe):

This change is

github-actions · 2025-05-19T03:04:38Z

Docker Image for PR

tag: pr-17952-c1292cd-1747623804

note: this image tag is only available for internal use.

github-actions · 2025-05-27T09:32:47Z

Docker Image for PR

tag: pr-17952-f863686-1748338293

note: this image tag is only available for internal use.

github-actions · 2025-05-27T10:18:43Z

ClickBench Report

SkyFan2002 added 7 commits May 12, 2025 15:04

remove unused field

a76a529

add ConcatBuffer

d2dd5b4

remove unused field and enable rf in shuffle join

595d4a2

rm unused modify

a6a6137

add broadcast_id in shuffle join

c20c62a

simplify runtime filter ready

221fb1b

update

4f7df3a

github-actions bot added the pr-feature this PR introduces a new feature to the codebase label May 17, 2025

SkyFan2002 mentioned this pull request May 17, 2025

feat: support runtime filter in shuffle join #17803

Closed

11 tasks

fix hang

b4868aa

SkyFan2002 added the ci-benchmark Benchmark: run all test label May 19, 2025

update logic test

34cac6f

SkyFan2002 force-pushed the hash_join_build branch from afbfeb4 to 34cac6f Compare May 19, 2025 04:16

SkyFan2002 added 9 commits May 19, 2025 15:03

fix

9a91e56

fix

695d00b

Merge branch 'main' into hash_join_build

d529ad2

refactor

8dae0f5

Merge branch 'main' into hash_join_build

6b96d2c

fix

f38fb4e

fix

c4a4cea

fix

fef622b

Merge branch 'main' into hash_join_build

c591505

SkyFan2002 force-pushed the hash_join_build branch from 1acf293 to ac91f06 Compare May 26, 2025 12:22

fix

ec1bd94

SkyFan2002 force-pushed the hash_join_build branch from 5cc7374 to ec1bd94 Compare May 27, 2025 07:57

Merge branch 'main' into hash_join_build

7c3d5f7

SkyFan2002 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels May 27, 2025

SkyFan2002 added 3 commits June 2, 2025 15:28

add log

1f74c34

add stat

3ac9d59

make lint

dc2b81d

SkyFan2002 force-pushed the hash_join_build branch from cb15a8c to dc2b81d Compare June 2, 2025 16:13

Merge branch 'main' into hash_join_build

7b8bef9

SkyFan2002 marked this pull request as ready for review June 2, 2025 16:51

SkyFan2002 requested review from zhang2014, sundy-li and dantengsky June 2, 2025 16:51

zhang2014 approved these changes Jun 4, 2025

View reviewed changes

BohuTANG merged commit 05232d7 into databendlabs:main Jun 4, 2025
87 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support runtime filter in shuffle join #17952

feat: support runtime filter in shuffle join #17952

SkyFan2002 commented May 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 19, 2025

Uh oh!

github-actions bot commented May 27, 2025

Uh oh!

github-actions bot commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!

feat: support runtime filter in shuffle join #17952

feat: support runtime filter in shuffle join #17952

Conversation

SkyFan2002 commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Overview

Key Changes

Example

Tests

Type of change

Uh oh!

github-actions bot commented May 19, 2025

Docker Image for PR

Uh oh!

github-actions bot commented May 27, 2025

Docker Image for PR

Uh oh!

github-actions bot commented May 27, 2025

ClickBench Report

Uh oh!

Uh oh!

Uh oh!

SkyFan2002 commented May 17, 2025 •

edited

Loading