You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/docs/streaming-lakehouse/integrate-data-lakes/paimon.md
+20-19Lines changed: 20 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,14 +21,14 @@ sidebar_position: 1
21
21
22
22
# Paimon
23
23
24
-
[Apache Paimon](https://paimon.apache.org/) innovatively combines a lake format with an LSM (Log-Structured Merge-tree) structure, bringing efficient updates into the lake architecture .
24
+
[Apache Paimon](https://paimon.apache.org/) innovatively combines a lake format with an LSM (Log-Structured Merge-tree) structure, bringing efficient updates into the lake architecture.
25
25
To integrate Fluss with Paimon, you must enable lakehouse storage and configure Paimon as the lakehouse storage. For more details, see [Enable Lakehouse Storage](maintenance/tiered-storage/lakehouse-storage.md#enable-lakehouse-storage).
26
26
27
27
## Introduction
28
28
29
-
When a table with the option `'table.datalake.enabled' = 'true'` is created or altered in Fluss, Fluss will automatically create a corresponding Paimon table with the same table path .
29
+
When a table is created or altered with the option `'table.datalake.enabled' = 'true'`, Fluss will automatically create a corresponding Paimon table with the same table path.
30
30
The schema of the Paimon table matches that of the Fluss table, except for the addition of three system columns at the end: `__bucket`, `__offset`, and `__timestamp`.
31
-
These system columns help Fluss clients consume data from Paimon in a streaming fashion—such as seeking by a specific bucket using an offset or timestamp.
31
+
These system columns help Fluss clients consume data from Paimon in a streaming fashion, such as seeking by a specific bucket using an offset or timestamp.
Then, the datalake tiering service continuously tiers data from Fluss to Paimon. The parameter `table.datalake.freshness` controls how soon data written to Fluss should be tiered to Paimon—by default, this delay is 3 minutes.
51
-
For primary key tables, change logs are also generated in Paimon format, enabling stream-based consumption via Paimon APIs.
51
+
Then, the datalake tiering service continuously tiers data from Fluss to Paimon. The parameter `table.datalake.freshness` controls the frequency that Fluss writes data to Paimon tables. By default, the data freshness is 3 minutes.
52
+
For primary key tables, changelogs are also generated in the Paimon format, enabling stream-based consumption via Paimon APIs.
52
53
53
54
Since Fluss version 0.7, you can also specify Paimon table properties when creating a datalake-enabled Fluss table by using the `paimon.` prefix within the Fluss table properties clause.
For example, you can specify the Paimon property `file.format` to change the file format of the Paimon table, or set `deletion-vectors.enabled` to enable or disable deletion vectors for the Paimon table.
73
75
74
76
## Read Tables
75
77
76
-
### Read by Flink
78
+
### Reading with Apache Flink
77
79
78
80
For a table with the option `'table.datalake.enabled' = 'true'`, its data exists in two layers: one remains in Fluss, and the other has already been tiered to Paimon.
79
81
You can choose between two views of the table:
@@ -102,7 +104,7 @@ For further information, refer to Paimon’s [SQL Query documentation](https://p
102
104
103
105
#### Union Read of Data in Fluss and Paimon
104
106
105
-
To read the full dataset, which includes both Fluss and Paimon data, simply query the table without any suffix. The following example illustrates this:
107
+
To read the full dataset, which includes both Fluss (fresh) and Paimon (historical) data, simply query the table without any suffix. The following example illustrates this:
106
108
107
109
```sql title="Flink SQL"
108
110
-- Query will union data from Fluss and Paimon
@@ -111,19 +113,18 @@ SELECT SUM(order_count) AS total_orders FROM ads_nation_purchase_power;
111
113
112
114
This query may run slower than reading only from Paimon, but it returns the most up-to-date data. If you execute the query multiple times, you may observe different results due to continuous data ingestion.
113
115
114
-
### Read by Other Engines
116
+
### Reading with other Engines
115
117
116
118
Since the data tiered to Paimon from Fluss is stored as a standard Paimon table, you can use any engine that supports Paimon to read it. Below is an example using [StarRocks](https://paimon.apache.org/docs/master/engines/starrocks/):
0 commit comments