Skip to content

Commit cc6b9af

Browse files
authored
[docs] Add document for versioned merge engine (#523)
1 parent 4b4fec2 commit cc6b9af

File tree

1 file changed

+79
-1
lines changed
  • website/docs/table-design/table-types/pk-table/merge-engines

1 file changed

+79
-1
lines changed

website/docs/table-design/table-types/pk-table/merge-engines/versioned.md

Lines changed: 79 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,82 @@ sidebar_position: 3
55

66
# Versioned Merge Engine
77

8-
TODO: Fill me #459
8+
The **Versioned Merge Engine** enables data updates based on version numbers or event timestamps. It ensures that only the row with the highest version number (or event timestamp) for a given primary key is retained. This mechanism is particularly useful for deduplicating or merging out-of-order data while guaranteeing eventual consistency with the upstream source.
9+
10+
By setting `'table.merge-engine' = 'versioned'`, users can update data based on a configured version column. Updates are performed when the latest value of the specified field is greater than or equal to the stored value. If the incoming value is less than the stored value or is null, no update will occur.
11+
12+
This feature is especially valuable as a replacement for [Deduplication](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/sql/queries/deduplication/) transformations in streaming computations. It simplifies workflows, reduces complexity, and improves overall efficiency.
13+
14+
15+
:::note
16+
When using the `versioned` merge engine, keep the following limitations in mind:
17+
- **`UPDATE` and `DELETE` statements are not supported.**
18+
- **Partial updates are not supported.**
19+
- **`UPDATE_BEFORE` and `DELETE` changelog events are ignored automatically.**
20+
:::
21+
22+
### Version Column
23+
24+
The version column is a column in the table that stores the version number (or event timestamp) of the data record.
25+
When enabling the versioned merge engine, the version column must be explicitly specified using the property:
26+
27+
```sql
28+
'table.merge-engine' = 'versioned',
29+
'table.merge-engine.versioned.ver-column' = '<column_name>'
30+
```
31+
32+
The version column can be of the following data types:
33+
- `INT`
34+
- `BIGINT`
35+
- `TIMESTAMP`
36+
- `TIMESTAMP(p)` (with precision)
37+
- `TIMESTAMP_LTZ` (timestamp with local time zone)
38+
- `TIMESTAMP_LTZ(p)` (timestamp with local time zone and precision)
39+
40+
41+
## Example:
42+
43+
```sql title="Flink SQL"
44+
45+
CREATE TABLE VERSIONED (
46+
a INT NOT NULL PRIMARY KEY NOT ENFORCED,
47+
b STRING,
48+
ts BIGINT
49+
) WITH (
50+
'table.merge-engine' = 'versioned',
51+
'table.merge-engine.versioned.ver-column' = 'ts'
52+
);
53+
INSERT INTO VERSIONED (a, b, ts) VALUES (1, 'v1', 1000);
54+
55+
-- insert data with ts < 1000, no update will be made
56+
INSERT INTO VERSIONED (a, b, ts) VALUES (1, 'v2', 999);
57+
SELECT * FROM VERSIONED;
58+
-- Output
59+
-- +---+-----+------+
60+
-- | a | b | ts |
61+
-- +---+-----+------+
62+
-- | 1 | v1 | 1000 |
63+
-- +---+-----+------+
64+
65+
66+
-- insert data with ts > 1000, update will be made
67+
INSERT INTO VERSIONED (a, b, ts) VALUES (1, 'v3', 2000);
68+
SELECT * FROM VERSIONED;
69+
-- Output
70+
-- +---+-----+------+
71+
-- | a | b | ts |
72+
-- +---+-----+------+
73+
-- | 1 | v3 | 2000 |
74+
-- +---+-----+------+
75+
76+
-- insert data with ts = null, no update will be made
77+
INSERT INTO VERSIONED (a, b, ts) VALUES (1, 'v4', null);
78+
SELECT * FROM VERSIONED;
79+
-- Output
80+
-- +---+-----+------+
81+
-- | a | b | ts |
82+
-- +---+-----+------+
83+
-- | 1 | v3 | 2000 |
84+
-- +---+-----+------+
85+
86+
```

0 commit comments

Comments
 (0)