You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/docs/engine-flink/deltajoins.md
+45-19Lines changed: 45 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,19 +1,27 @@
1
1
---
2
-
sidebar_label: DeltaJoins
2
+
sidebar_label: Delta Joins
3
3
title: Flink Delta Joins
4
4
sidebar_position: 6
5
5
---
6
6
7
-
# Delta Join
8
-
Begin with Flink 2.1, a new delta join operator was introduced. Compared to traditional streaming join, delta join significantly reduces the required state, effectively alleviating issues associated with large state, such as resource bottlenecks, lengthy checkpoint execution times, and long recovery times during job restarts.
7
+
# The Delta Join
8
+
Beginning with **Apache Flink 2.1**, a new operator called **Delta Join** was introduced.
9
+
Compared to traditional streaming joins, the delta join operator significantly reduces the amount of state that needs to be maintained during execution. This improvement helps mitigate several common issues associated with large state sizes, including:
9
10
10
-
Starting from Fluss version 0.8, streaming join jobs running on Flink 2.1 or higher will be automatically optimized to delta join in applicable scenarios.
11
+
- Excessive memory and storage consumption
12
+
- Long checkpointing durations
13
+
- Extended recovery times after failures or restarts
11
14
12
-
## Examples
15
+
Starting with **Apache Fluss 0.8**, streaming join jobs running on **Flink 2.1 or later** will be automatically optimized into **delta joins** whenever applicable. This optimization happens transparently at query planning time, requiring no manual configuration.
13
16
14
-
Here is an example of delta join currently supported in Flink 2.1.
17
+
## How Delta Join Works
18
+
Traditional streaming joins in Flink require maintaining both input sides entirely in state to match updates across streams. Delta join, by contrast, uses a **prefix-based lookup mechanism** that only retains *relevant subsets* of one table’s data in state. This drastically reduces memory pressure and improves performance for many streaming analytics and enrichment workloads.
15
19
16
-
1. Create two source tables and one sink tables
20
+
## Example: Delta Join in Flink 2.1
21
+
22
+
Below is an example demonstrating a delta join query supported by Flink 2.1.
If you see the plan that includes DeltaJoin as following, it indicates that the optimization has been effective, and the streaming join has been successfully optimized into a delta join.
86
+
If the physical plan includes `DeltaJoin`, it indicates that the optimizer has successfully transformed the traditional streaming join into a delta join.
This confirms that the delta join optimization is active.
116
+
117
+
## Understanding Prefix Keys
118
+
A prefix key defines the portion of a table’s primary key that can be used for efficient key-based lookups or index pruning.
119
+
120
+
In Fluss, the option `'bucket.key' = 'city_id'` specifies that data is organized (or bucketed) by `city_id`. When performing a delta join, this allows Flink to quickly locate and read only the subset of records corresponding to the specific prefix key value, rather than scanning or caching the entire table state.
121
+
122
+
For example:
123
+
- Full primary key: `(city_id, order_id)`
124
+
- Prefix key: `city_id`
125
+
126
+
In this setup:
127
+
* The delta join operator uses the prefix key (`city_id`) to retrieve only relevant right-side records matching each left-side event.
128
+
* This eliminates the need to hold all records for every city in memory, significantly reducing state size.
129
+
130
+
Prefix keys thus form the foundation for state-efficient lookups in delta joins, enabling Flink to scale join workloads efficiently even under high throughput.
108
131
109
132
## Flink Version Support
110
133
111
-
The work on Delta Join is still ongoing, so the support for more sql patterns that can be optimized into delta join varies across different versions of Flink. More details can be found at [Delta Join](https://issues.apache.org/jira/browse/FLINK-37836).
134
+
The delta join feature is still evolving, and its optimization capabilities vary across Flink versions.
135
+
136
+
Refer to the [Delta Join](https://issues.apache.org/jira/browse/FLINK-37836) for the most up-to-date information.
0 commit comments