Skip to content

Commit 61aa7be

Browse files
committed
add lance doc
1 parent 20549dc commit 61aa7be

File tree

1 file changed

+110
-0
lines changed
  • website/docs/streaming-lakehouse/integrate-data-lakes

1 file changed

+110
-0
lines changed
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
---
2+
title: Lance
3+
sidebar_position: 1
4+
---
5+
6+
# Lance
7+
8+
[Apache Paimon](https://paimon.apache.org/) innovatively combines a lake format with an LSM (Log-Structured Merge-tree) structure, bringing efficient updates into the lake architecture.
9+
To integrate Fluss with Lance, you must enable lakehouse storage and configure Lance as the lakehouse storage. For more details, see [Enable Lakehouse Storage](maintenance/tiered-storage/lakehouse-storage.md#enable-lakehouse-storage).
10+
11+
## Introduction
12+
13+
When a table is created or altered with the option `'table.datalake.enabled' = 'true'`, Fluss will automatically create a corresponding Lance table with the same table path.
14+
The schema of the Paimon table matches that of the Fluss table.
15+
16+
```sql title="Flink SQL"
17+
USE CATALOG fluss_catalog;
18+
19+
CREATE TABLE fluss_order_with_lake (
20+
`order_key` BIGINT,
21+
`cust_key` INT NOT NULL,
22+
`total_price` DECIMAL(15, 2),
23+
`order_date` DATE,
24+
`order_priority` STRING,
25+
`clerk` STRING,
26+
`ptime` AS PROCTIME(),
27+
PRIMARY KEY (`order_key`) NOT ENFORCED
28+
) WITH (
29+
'table.datalake.enabled' = 'true',
30+
'table.datalake.freshness' = '30s'
31+
);
32+
```
33+
34+
Then, the datalake tiering service continuously tiers data from Fluss to Lance. The parameter `table.datalake.freshness` controls the frequency that Fluss writes data to Paimon tables. By default, the data freshness is 3 minutes.
35+
For primary key tables, changelogs are also generated in the Paimon format, enabling stream-based consumption via Paimon APIs.
36+
37+
Since Fluss version 0.7, you can also specify Paimon table properties when creating a datalake-enabled Fluss table by using the `paimon.` prefix within the Fluss table properties clause.
38+
39+
```sql title="Flink SQL"
40+
CREATE TABLE fluss_order_with_lake (
41+
`order_key` BIGINT,
42+
`cust_key` INT NOT NULL,
43+
`total_price` DECIMAL(15, 2),
44+
`order_date` DATE,
45+
`order_priority` STRING,
46+
`clerk` STRING,
47+
`ptime` AS PROCTIME(),
48+
PRIMARY KEY (`order_key`) NOT ENFORCED
49+
) WITH (
50+
'table.datalake.enabled' = 'true',
51+
'table.datalake.freshness' = '30s',
52+
'paimon.file.format' = 'orc',
53+
'paimon.deletion-vectors.enabled' = 'true'
54+
);
55+
```
56+
57+
For example, you can specify the Paimon property `file.format` to change the file format of the Paimon table, or set `deletion-vectors.enabled` to enable or disable deletion vectors for the Paimon table.
58+
59+
### Reading with other Engines
60+
61+
Since the data tiered to Paimon from Fluss is stored as a standard Paimon table, you can use any engine that supports Paimon to read it. Below is an example using [StarRocks](https://paimon.apache.org/docs/master/engines/starrocks/):
62+
63+
First, create a Paimon catalog in StarRocks:
64+
65+
```sql title="StarRocks SQL"
66+
CREATE EXTERNAL CATALOG paimon_catalog
67+
PROPERTIES (
68+
"type" = "paimon",
69+
"paimon.catalog.type" = "filesystem",
70+
"paimon.catalog.warehouse" = "/tmp/paimon_data_warehouse"
71+
);
72+
```
73+
74+
> **NOTE**: The configuration values for `paimon.catalog.type` and `paimon.catalog.warehouse` must match those used when configuring Paimon as the lakehouse storage for Fluss in `server.yaml`.
75+
76+
Then, you can query the `orders` table using StarRocks:
77+
78+
```sql title="StarRocks SQL"
79+
-- The table is in the database `fluss`
80+
SELECT COUNT(*) FROM paimon_catalog.fluss.orders;
81+
```
82+
83+
```sql title="StarRocks SQL"
84+
-- Query the system tables to view snapshots of the table
85+
SELECT * FROM paimon_catalog.fluss.enriched_orders$snapshots;
86+
```
87+
88+
## Data Type Mapping
89+
90+
When integrating with Paimon, Fluss automatically converts between Fluss data types and Paimon data types.
91+
The following table shows the mapping between [Fluss data types](table-design/data-types.md) and Paimon data types:
92+
93+
| Fluss Data Type | Paimon Data Type |
94+
|-------------------------------|-------------------------------|
95+
| BOOLEAN | BOOLEAN |
96+
| TINYINT | TINYINT |
97+
| SMALLINT | SMALLINT |
98+
| INT | INT |
99+
| BIGINT | BIGINT |
100+
| FLOAT | FLOAT |
101+
| DOUBLE | DOUBLE |
102+
| DECIMAL | DECIMAL |
103+
| STRING | STRING |
104+
| CHAR | CHAR |
105+
| DATE | DATE |
106+
| TIME | TIME |
107+
| TIMESTAMP | TIMESTAMP |
108+
| TIMESTAMP WITH LOCAL TIMEZONE | TIMESTAMP WITH LOCAL TIMEZONE |
109+
| BINARY | BINARY |
110+
| BYTES | BYTES |

0 commit comments

Comments
 (0)