You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/docs/streaming-lakehouse/integrate-data-lakes/lance.md
+9-24Lines changed: 9 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,13 +5,13 @@ sidebar_position: 1
5
5
6
6
# Lance
7
7
8
-
[Apache Paimon](https://paimon.apache.org/) innovatively combines a lake format with an LSM (Log-Structured Merge-tree) structure, bringing efficient updates into the lake architecture.
8
+
[Lance](https://lancedb.github.io/lance/) is a modern table format optimized for machine learning and AI applications.
9
9
To integrate Fluss with Lance, you must enable lakehouse storage and configure Lance as the lakehouse storage. For more details, see [Enable Lakehouse Storage](maintenance/tiered-storage/lakehouse-storage.md#enable-lakehouse-storage).
10
10
11
11
## Introduction
12
12
13
13
When a table is created or altered with the option `'table.datalake.enabled' = 'true'`, Fluss will automatically create a corresponding Lance table with the same table path.
14
-
The schema of the Paimon table matches that of the Fluss table.
14
+
The schema of the Lance table matches that of the Fluss table.
Then, the datalake tiering service continuously tiers data from Fluss to Lance. The parameter `table.datalake.freshness` controls the frequency that Fluss writes data to Paimon tables. By default, the data freshness is 3 minutes.
35
-
For primary key tables, changelogs are also generated in the Paimon format, enabling stream-based consumption via Paimon APIs.
34
+
Then, the datalake tiering service continuously tiers data from Fluss to Lance. The parameter `table.datalake.freshness` controls the frequency that Fluss writes data to Lance tables. By default, the data freshness is 3 minutes.
36
35
37
-
Since Fluss version 0.7, you can also specify Paimon table properties when creating a datalake-enabled Fluss table by using the `paimon.` prefix within the Fluss table properties clause.
36
+
Since Fluss version 0.7, you can also specify Lance table properties when creating a datalake-enabled Fluss table by using the `lance.` prefix within the Fluss table properties clause.
For example, you can specify the Paimon property `file.format` to change the file format of the Paimon table, or set `deletion-vectors.enabled` to enable or disable deletion vectors for the Paimon table.
56
+
For example, you can specify the Lance property `file.format` to change the file format of the Paimon table, or set `deletion-vectors.enabled` to enable or disable deletion vectors for the Paimon table.
58
57
59
58
### Reading with other Engines
60
59
61
-
Since the data tiered to Paimon from Fluss is stored as a standard Paimon table, you can use any engine that supports Paimon to read it. Below is an example using [StarRocks](https://paimon.apache.org/docs/master/engines/starrocks/):
62
-
63
-
First, create a Paimon catalog in StarRocks:
60
+
Since the data tiered to Lance from Fluss is stored as a standard Lance table, you can use any engine that supports Lance to read it. Below is an example using [pylance](https://pypi.org/project/pylance/):
64
61
65
62
```sql title="StarRocks SQL"
66
63
CREATE EXTERNAL CATALOG paimon_catalog
@@ -73,24 +70,12 @@ PROPERTIES (
73
70
74
71
> **NOTE**: The configuration values for `paimon.catalog.type` and `paimon.catalog.warehouse` must match those used when configuring Paimon as the lakehouse storage for Fluss in `server.yaml`.
75
72
76
-
Then, you can query the `orders` table using StarRocks:
77
-
78
-
```sql title="StarRocks SQL"
79
-
-- The table is in the database `fluss`
80
-
SELECTCOUNT(*) FROMpaimon_catalog.fluss.orders;
81
-
```
82
-
83
-
```sql title="StarRocks SQL"
84
-
-- Query the system tables to view snapshots of the table
0 commit comments