You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/content/append-table/blob.md
+76-14Lines changed: 76 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -71,7 +71,7 @@ For details about the blob file format structure, see [File Format - BLOB]({{< r
71
71
72
72
## Storage Modes
73
73
74
-
Paimon supports three storage modes for BLOB fields:
74
+
Paimon supports four storage modes for BLOB fields:
75
75
76
76
1.**Default blob storage**
77
77
Blob bytes are written to Paimon-managed `.blob` files under the table path.
@@ -82,7 +82,10 @@ Paimon supports three storage modes for BLOB fields:
82
82
3.**External-storage descriptor mode**
83
83
Fields configured in `blob-external-storage-field` are a subset of `blob-descriptor-field`. At write time, Paimon writes the raw blob data to the configured `blob-external-storage-path` and stores only serialized `BlobDescriptor` bytes inline in data files.
84
84
85
-
This allows one table to mix raw-data BLOB fields, descriptor-only BLOB fields, and descriptor-based BLOB fields backed by external storage.
85
+
4.**Blob view storage**
86
+
Fields configured in `blob-view-field` store serialized `BlobViewStruct` bytes inline in data files. The struct points to a BLOB value in an upstream table by table identifier, BLOB field, and row id. The actual blob bytes are resolved from the upstream table at read time.
87
+
88
+
This allows one table to mix raw-data BLOB fields, descriptor-only BLOB fields, descriptor-based BLOB fields backed by external storage, and view fields that reference upstream BLOB values.
86
89
87
90
## Table Options
88
91
@@ -123,6 +126,17 @@ This allows one table to mix raw-data BLOB fields, descriptor-only BLOB fields,
123
126
some BLOB fields in <code>.blob</code> files and some as descriptor references.
124
127
</td>
125
128
</tr>
129
+
<tr>
130
+
<td><h5>blob-view-field</h5></td>
131
+
<td>No</td>
132
+
<td style="word-wrap: break-word;">(none)</td>
133
+
<td>String</td>
134
+
<td>
135
+
Comma-separated BLOB field names stored as serialized <code>BlobViewStruct</code> bytes inline in normal data files.
136
+
The field values reference BLOB values in upstream tables and are resolved at read time.
137
+
This option must be a subset of <code>blob-field</code> and must not overlap with <code>blob-descriptor-field</code>.
138
+
</td>
139
+
</tr>
126
140
<tr>
127
141
<td><h5>blob-external-storage-field</h5></td>
128
142
<td>No</td>
@@ -279,30 +293,75 @@ ALTER TABLE blob_table SET ('blob-as-descriptor' = 'false');
279
293
SELECT image FROM blob_table;
280
294
```
281
295
282
-
### External-Storage Descriptor Fields
296
+
### Blob View
297
+
298
+
Blob view is useful when a downstream table should reference BLOB values already stored in an upstream table, without copying the bytes or creating new `.blob` files. A blob view field stores only a small `BlobViewStruct` inline. When the field is read, Paimon resolves the referenced BLOB from the upstream table.
299
+
300
+
Blob view requires:
283
301
284
-
If you want Paimon to accept raw BLOB input, write the data to an external location, and store only descriptor bytes inline, configure the target field(s) like this:
302
+
- the upstream table to have row tracking enabled, so each row has a stable `_ROW_ID`
303
+
- the downstream field to be listed in both `blob-field` and `blob-view-field`
304
+
- writes to provide a serialized `BlobViewStruct`; in Flink SQL, use the built-in `sys.blob_view` function
-`table_name`: the upstream table name. It must be fully qualified as `database.table` or `catalog.database.table`. Unqualified table names are rejected.
315
+
-`field_name`: the upstream BLOB field name.
316
+
-`row_id`: the `_ROW_ID` value from the upstream row-tracking table.
317
+
318
+
The following example writes a downstream table whose `image_ref` field views the `image` field in `image_table`:
Reads from `image_view_table.image_ref` return the referenced BLOB bytes in the same way as normal blob fields. The referenced upstream table and row must remain available for the view to be resolved.
298
358
299
359
### MERGE INTO Support
300
360
301
361
For Data Evolution writes in Flink and Spark:
302
362
303
363
- raw-data BLOB columns are still rejected in partial-column `MERGE INTO` updates
304
364
- descriptor-based BLOB columns are allowed
305
-
- fields configured in `blob-external-storage-field` are also allowed because they are descriptor-based fields
306
365
307
366
## Java API Usage
308
367
@@ -661,6 +720,7 @@ For these configured fields:
661
720
3.**No Statistics**: Statistics collection is not supported for blob columns.
662
721
4.**Required Options**: `row-tracking.enabled` and `data-evolution.enabled` must be set to `true`.
663
722
5.**External Storage Cleanup**: Files written through `blob-external-storage-path` are outside Paimon's orphan file cleanup scope.
723
+
6.**Blob View Dependency**: Blob view fields depend on the referenced upstream table and row. If the upstream data is removed or no longer readable, the view cannot be resolved.
664
724
665
725
## Best Practices
666
726
@@ -674,4 +734,6 @@ For these configured fields:
674
734
675
735
5.**Manage External Storage Lifecycle Separately**: Files written to `blob-external-storage-path` are not cleaned up by Paimon, so retention and deletion should be managed externally.
676
736
677
-
6.**Use Partitioning**: Partition your blob tables by date or other dimensions to improve query performance and data management.
737
+
6.**Use Blob View to Avoid Copying BLOB Data**: Configure `blob-view-field` when a downstream table only needs to reference BLOB values from an upstream table.
738
+
739
+
7.**Use Partitioning**: Partition your blob tables by date or other dimensions to improve query performance and data management.
Copy file name to clipboardExpand all lines: docs/content/append-table/global-index.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,6 +41,8 @@ Global indexes work on top of Data Evolution tables. To use global indexes, your
41
41
-`'row-tracking.enabled' = 'true'`
42
42
-`'data-evolution.enabled' = 'true'`
43
43
44
+
> Global index queries may not be exact when the index only covers part of the table data. If a query predicate matches the index, Paimon returns only the results from the indexed portion. Matching records in data that has not been indexed yet will not be returned.
45
+
44
46
## Prerequisites
45
47
46
48
Create a table with the required properties:
@@ -95,11 +97,13 @@ Generation) applications.
95
97
CALL sys.create_global_index(
96
98
table =>'db.my_table',
97
99
index_column =>'embedding',
98
-
index_type =>'lumina-vector-ann',
100
+
index_type =>'lumina',
99
101
options =>'lumina.index.dimension=128'
100
102
);
101
103
```
102
104
105
+
The legacy index type `lumina-vector-ann` is still accepted for existing tables and SQL compatibility.
Copy file name to clipboardExpand all lines: docs/content/project/download.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,6 +41,7 @@ This documentation is a guide for downloading Paimon Jars.
41
41
| Flink 1.17 |[paimon-flink-1.17-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-flink-1.17/{{< version >}}/) |
42
42
| Flink 1.16 |[paimon-flink-1.16-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-flink-1.16/{{< version >}}/) |
43
43
| Flink Action |[paimon-flink-action-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-flink-action/{{< version >}}/) |
44
+
| Spark 4.1 |[paimon-spark-4.1_2.13-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-spark-4.1_2.13/{{< version >}}/) |
44
45
| Spark 4.0 |[paimon-spark-4.0_2.13-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-spark-4.0_2.13/{{< version >}}/) |
45
46
| Spark 3.5 |[paimon-spark-3.5_2.12-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-spark-3.5_2.12/{{< version >}}/) |
46
47
| Spark 3.4 |[paimon-spark-3.4_2.12-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-spark-3.4_2.12/{{< version >}}/) |
@@ -68,6 +69,7 @@ This documentation is a guide for downloading Paimon Jars.
68
69
| Flink 1.17 |[paimon-flink-1.17-{{< version >}}.jar](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-flink-1.17/{{< version >}}/paimon-flink-1.17-{{< version >}}.jar) |
69
70
| Flink 1.16 |[paimon-flink-1.16-{{< version >}}.jar](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-flink-1.16/{{< version >}}/paimon-flink-1.16-{{< version >}}.jar) |
70
71
| Flink Action |[paimon-flink-action-{{< version >}}.jar](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-flink-action/{{< version >}}/paimon-flink-action-{{< version >}}.jar) |
72
+
| Spark 4.1 |[paimon-spark-4.1_2.13-{{< version >}}.jar](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-spark-4.1_2.13/{{< version >}}/paimon-spark-4.1_2.13-{{< version >}}.jar) |
71
73
| Spark 4.0 |[paimon-spark-4.0_2.13-{{< version >}}.jar](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-spark-4.0_2.13/{{< version >}}/paimon-spark-4.0_2.13-{{< version >}}.jar) |
72
74
| Spark 3.5 |[paimon-spark-3.5_2.12-{{< version >}}.jar](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-spark-3.5_2.12/{{< version >}}/paimon-spark-3.5_2.12-{{< version >}}.jar) |
73
75
| Spark 3.4 |[paimon-spark-3.4_2.12-{{< version >}}.jar](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-spark-3.4_2.12/{{< version >}}/paimon-spark-3.4_2.12-{{< version >}}.jar) |
0 commit comments