Skip to content

Commit d9679ce

Browse files
authored
[docs] minor paimon page improvements (#1133)
1 parent 5dc583f commit d9679ce

File tree

1 file changed

+20
-19
lines changed
  • website/docs/streaming-lakehouse/integrate-data-lakes

1 file changed

+20
-19
lines changed

website/docs/streaming-lakehouse/integrate-data-lakes/paimon.md

Lines changed: 20 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,14 @@ sidebar_position: 1
2121

2222
# Paimon
2323

24-
[Apache Paimon](https://paimon.apache.org/) innovatively combines a lake format with an LSM (Log-Structured Merge-tree) structure, bringing efficient updates into the lake architecture .
24+
[Apache Paimon](https://paimon.apache.org/) innovatively combines a lake format with an LSM (Log-Structured Merge-tree) structure, bringing efficient updates into the lake architecture.
2525
To integrate Fluss with Paimon, you must enable lakehouse storage and configure Paimon as the lakehouse storage. For more details, see [Enable Lakehouse Storage](maintenance/tiered-storage/lakehouse-storage.md#enable-lakehouse-storage).
2626

2727
## Introduction
2828

29-
When a table with the option `'table.datalake.enabled' = 'true'` is created or altered in Fluss, Fluss will automatically create a corresponding Paimon table with the same table path .
29+
When a table is created or altered with the option `'table.datalake.enabled' = 'true'`, Fluss will automatically create a corresponding Paimon table with the same table path.
3030
The schema of the Paimon table matches that of the Fluss table, except for the addition of three system columns at the end: `__bucket`, `__offset`, and `__timestamp`.
31-
These system columns help Fluss clients consume data from Paimon in a streaming fashionsuch as seeking by a specific bucket using an offset or timestamp.
31+
These system columns help Fluss clients consume data from Paimon in a streaming fashion, such as seeking by a specific bucket using an offset or timestamp.
3232

3333
```sql title="Flink SQL"
3434
USE CATALOG fluss_catalog;
@@ -43,12 +43,13 @@ CREATE TABLE fluss_order_with_lake (
4343
`ptime` AS PROCTIME(),
4444
PRIMARY KEY (`order_key`) NOT ENFORCED
4545
) WITH (
46-
'table.datalake.enabled' = 'true',
47-
'table.datalake.freshness' = '30s');
46+
'table.datalake.enabled' = 'true',
47+
'table.datalake.freshness' = '30s'
48+
);
4849
```
4950

50-
Then, the datalake tiering service continuously tiers data from Fluss to Paimon. The parameter `table.datalake.freshness` controls how soon data written to Fluss should be tiered to Paimon—by default, this delay is 3 minutes.
51-
For primary key tables, change logs are also generated in Paimon format, enabling stream-based consumption via Paimon APIs.
51+
Then, the datalake tiering service continuously tiers data from Fluss to Paimon. The parameter `table.datalake.freshness` controls the frequency that Fluss writes data to Paimon tables. By default, the data freshness is 3 minutes.
52+
For primary key tables, changelogs are also generated in the Paimon format, enabling stream-based consumption via Paimon APIs.
5253

5354
Since Fluss version 0.7, you can also specify Paimon table properties when creating a datalake-enabled Fluss table by using the `paimon.` prefix within the Fluss table properties clause.
5455

@@ -63,17 +64,18 @@ CREATE TABLE fluss_order_with_lake (
6364
`ptime` AS PROCTIME(),
6465
PRIMARY KEY (`order_key`) NOT ENFORCED
6566
) WITH (
66-
'table.datalake.enabled' = 'true',
67-
'table.datalake.freshness' = '30s',
68-
'paimon.file.format' = 'orc',
69-
'paimon.deletion-vectors.enabled' = 'true');
67+
'table.datalake.enabled' = 'true',
68+
'table.datalake.freshness' = '30s',
69+
'paimon.file.format' = 'orc',
70+
'paimon.deletion-vectors.enabled' = 'true'
71+
);
7072
```
7173

7274
For example, you can specify the Paimon property `file.format` to change the file format of the Paimon table, or set `deletion-vectors.enabled` to enable or disable deletion vectors for the Paimon table.
7375

7476
## Read Tables
7577

76-
### Read by Flink
78+
### Reading with Apache Flink
7779

7880
For a table with the option `'table.datalake.enabled' = 'true'`, its data exists in two layers: one remains in Fluss, and the other has already been tiered to Paimon.
7981
You can choose between two views of the table:
@@ -102,7 +104,7 @@ For further information, refer to Paimon’s [SQL Query documentation](https://p
102104

103105
#### Union Read of Data in Fluss and Paimon
104106

105-
To read the full dataset, which includes both Fluss and Paimon data, simply query the table without any suffix. The following example illustrates this:
107+
To read the full dataset, which includes both Fluss (fresh) and Paimon (historical) data, simply query the table without any suffix. The following example illustrates this:
106108

107109
```sql title="Flink SQL"
108110
-- Query will union data from Fluss and Paimon
@@ -111,19 +113,18 @@ SELECT SUM(order_count) AS total_orders FROM ads_nation_purchase_power;
111113

112114
This query may run slower than reading only from Paimon, but it returns the most up-to-date data. If you execute the query multiple times, you may observe different results due to continuous data ingestion.
113115

114-
### Read by Other Engines
116+
### Reading with other Engines
115117

116118
Since the data tiered to Paimon from Fluss is stored as a standard Paimon table, you can use any engine that supports Paimon to read it. Below is an example using [StarRocks](https://paimon.apache.org/docs/master/engines/starrocks/):
117119

118120
First, create a Paimon catalog in StarRocks:
119121

120122
```sql title="StarRocks SQL"
121123
CREATE EXTERNAL CATALOG paimon_catalog
122-
PROPERTIES
123-
(
124-
"type" = "paimon",
125-
"paimon.catalog.type" = "filesystem",
126-
"paimon.catalog.warehouse" = "/tmp/paimon_data_warehouse"
124+
PROPERTIES (
125+
"type" = "paimon",
126+
"paimon.catalog.type" = "filesystem",
127+
"paimon.catalog.warehouse" = "/tmp/paimon_data_warehouse"
127128
);
128129
```
129130

0 commit comments

Comments
 (0)