Skip to content

Commit 1d7e4ac

Browse files
committed
add doc for lance
1 parent 20549dc commit 1d7e4ac

File tree

1 file changed

+98
-0
lines changed
  • website/docs/streaming-lakehouse/integrate-data-lakes

1 file changed

+98
-0
lines changed
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
---
2+
title: Lance
3+
sidebar_position: 1
4+
---
5+
6+
# Lance
7+
8+
[Lance](https://lancedb.github.io/lance/) is a modern table format optimized for machine learning and AI applications.
9+
To integrate Fluss with Lance, you must enable lakehouse storage and configure Lance as the lakehouse storage. For more details, see [Enable Lakehouse Storage](maintenance/tiered-storage/lakehouse-storage.md#enable-lakehouse-storage).
10+
11+
## Introduction
12+
13+
To configure Lance as the lakehouse storage, you must configure the following configurations in `server.yaml`:
14+
```yaml
15+
# Lance configuration
16+
datalake.format: lance
17+
18+
datalake.lance.warehouse: /tmp/lance
19+
20+
# To S3 as Lance storage backend, you need to specify the following properties
21+
# datalake.lance.warehouse: s3://<bucket>
22+
# datalake.lance.endpoint: <endpoint>
23+
# datalake.lance.allow_http: true
24+
# datalake.lance.access_key_id: <access_key_id>
25+
# datalake.lance.secret_access_key: <secret_access_key>
26+
```
27+
28+
When a table is created or altered with the option `'table.datalake.enabled' = 'true'`, Fluss will automatically create a corresponding Lance table with path `<warehouse_path>/<database_name>/<table_name>.lance`.
29+
The schema of the Lance table matches that of the Fluss table.
30+
31+
```sql title="Flink SQL"
32+
USE CATALOG fluss_catalog;
33+
34+
CREATE TABLE fluss_order_with_lake (
35+
`order_id` BIGINT,
36+
`item_id` BIGINT,
37+
`amount` INT,
38+
`address` STRING
39+
) WITH (
40+
'table.datalake.enabled' = 'true',
41+
'table.datalake.freshness' = '30s'
42+
);
43+
```
44+
45+
Then, the datalake tiering service continuously tiers data from Fluss to Lance. The parameter `table.datalake.freshness` controls the frequency that Fluss writes data to Lance tables. By default, the data freshness is 3 minutes.
46+
47+
> **NOTE**: Fluss v0.8 only supports tiering log tables.
48+
49+
You can also specify Lance table properties when creating a datalake-enabled Fluss table by using the `lance.` prefix within the Fluss table properties clause.
50+
51+
```sql title="Flink SQL"
52+
CREATE TABLE fluss_order_with_lake (
53+
`order_id` BIGINT,
54+
`item_id` BIGINT,
55+
`amount` INT,
56+
`address` STRING
57+
) WITH (
58+
'table.datalake.enabled' = 'true',
59+
'table.datalake.freshness' = '30s',
60+
'lance.max_row_per_file' = '512'
61+
);
62+
```
63+
64+
For example, you can specify the property `max_row_per_file` to control the writing behavior when Fluss tiers data to Lance.
65+
66+
### Reading with Lance ecosystem tools
67+
68+
Since the data tiered to Lance from Fluss is stored as a standard Lance table, you can use any tool that supports Lance to read it. Below is an example using [pylance](https://pypi.org/project/pylance/):
69+
70+
```python title="Lance Python"
71+
import lance
72+
ds = lance.dataset("<warehouse_path>/<database_name>/<table_name>.lance")
73+
```
74+
75+
## Data Type Mapping
76+
77+
Lance internally stores data in Arrow format.
78+
When integrating with Lance, Fluss automatically converts between Fluss data types and Lance data types.
79+
The following table shows the mapping between [Fluss data types](table-design/data-types.md) and Lance data types:
80+
81+
| Fluss Data Type | Lance Data Type |
82+
|-------------------------------|-----------------|
83+
| BOOLEAN | BOOLEAN |
84+
| TINYINT | Int8 |
85+
| SMALLINT | Int16 |
86+
| INT | Int32 |
87+
| BIGINT | Int64 |
88+
| FLOAT | Float32 |
89+
| DOUBLE | Float64 |
90+
| DECIMAL | Decimal128 |
91+
| STRING | Utf8 |
92+
| CHAR | Utf8 |
93+
| DATE | Date |
94+
| TIME | TIME |
95+
| TIMESTAMP | TIMESTAMP |
96+
| TIMESTAMP WITH LOCAL TIMEZONE | TIMESTAMP |
97+
| BINARY | FixedSizeBinary |
98+
| BYTES | BINARY |

0 commit comments

Comments
 (0)