Skip to content

Commit 11d0423

Browse files
authored
Merge pull request #733 from Altinity/backport/anyalya/77156
Support Iceberg Metadata Files Cache ClickHouse#77156
2 parents c7127b8 + 5c6d646 commit 11d0423

File tree

116 files changed

+4193
-991
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

116 files changed

+4193
-991
lines changed

docs/en/engines/table-engines/integrations/iceberg.md

Lines changed: 173 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ sidebar_position: 90
44
sidebar_label: Iceberg
55
---
66

7-
# Iceberg Table Engine
7+
# Iceberg Table Engine {#iceberg-table-engine}
88

99
:::warning
1010
We recommend using the [Iceberg Table Function](/docs/sql-reference/table-functions/iceberg.md) for working with Iceberg data in ClickHouse. The Iceberg Table Function currently provides sufficient functionality, offering a partial read-only interface for Iceberg tables.
@@ -34,14 +34,14 @@ CREATE TABLE iceberg_table_local
3434
ENGINE = IcebergLocal(path_to_table, [,format] [,compression_method])
3535
```
3636

37-
**Engine arguments**
37+
## Engine arguments {#engine-arguments}
3838

3939
Description of the arguments coincides with description of arguments in engines `S3`, `AzureBlobStorage`, `HDFS` and `File` correspondingly.
4040
`format` stands for the format of data files in the Iceberg table.
4141

4242
Engine parameters can be specified using [Named Collections](../../../operations/named-collections.md)
4343

44-
**Example**
44+
### Example {#example}
4545

4646
```sql
4747
CREATE TABLE iceberg_table ENGINE=IcebergS3('http://test.s3.amazonaws.com/clickhouse-bucket/test_table', 'test', 'test')
@@ -66,12 +66,11 @@ CREATE TABLE iceberg_table ENGINE=IcebergS3(iceberg_conf, filename = 'test_table
6666

6767
```
6868

69-
**Aliases**
70-
69+
## Aliases {#aliases}
7170

7271
Table engine `Iceberg` is an alias to `IcebergS3` now.
7372

74-
**Schema Evolution**
73+
## Schema Evolution {#schema-evolution}
7574
At the moment, with the help of CH, you can read iceberg tables, the schema of which has changed over time. We currently support reading tables where columns have been added and removed, and their order has changed. You can also change a column where a value is required to one where NULL is allowed. Additionally, we support permitted type casting for simple types, namely:  
7675
* int -> long
7776
* float -> double
@@ -81,14 +80,179 @@ Currently, it is not possible to change nested structures or the types of elemen
8180

8281
To read a table where the schema has changed after its creation with dynamic schema inference, set allow_dynamic_metadata_for_data_lakes = true when creating the table.
8382

84-
**Partition Pruning**
83+
## Partition Pruning {#partition-pruning}
8584

8685
ClickHouse supports partition pruning during SELECT queries for Iceberg tables, which helps optimize query performance by skipping irrelevant data files. Now it works with only identity transforms and time-based transforms (hour, day, month, year). To enable partition pruning, set `use_iceberg_partition_pruning = 1`.
8786

88-
### Data cache {#data-cache}
87+
88+
## Time Travel {#time-travel}
89+
90+
ClickHouse supports time travel for Iceberg tables, allowing you to query historical data with a specific timestamp or snapshot ID.
91+
92+
### Basic usage {#basic-usage}
93+
```sql
94+
SELECT * FROM example_table ORDER BY 1
95+
SETTINGS iceberg_timestamp_ms = 1714636800000
96+
```
97+
98+
```sql
99+
SELECT * FROM example_table ORDER BY 1
100+
SETTINGS iceberg_snapshot_id = 3547395809148285433
101+
```
102+
103+
Note: You cannot specify both `iceberg_timestamp_ms` and `iceberg_snapshot_id` parameters in the same query.
104+
105+
### Important considerations {#important-considerations}
106+
107+
- **Snapshots** are typically created when:
108+
- New data is written to the table
109+
- Some kind of data compaction is performed
110+
111+
- **Schema changes typically don't create snapshots** - This leads to important behaviors when using time travel with tables that have undergone schema evolution.
112+
113+
### Example scenarios {#example-scenarios}
114+
115+
All scenarios are written in Spark because CH doesn't support writing to Iceberg tables yet.
116+
117+
#### Scenario 1: Schema Changes Without New Snapshots {#scenario-1}
118+
119+
Consider this sequence of operations:
120+
121+
```sql
122+
-- Create a table with two columns
123+
CREATE TABLE IF NOT EXISTS spark_catalog.db.time_travel_example (
124+
order_number int,
125+
product_code string
126+
)
127+
USING iceberg
128+
OPTIONS ('format-version'='2')
129+
130+
-- Insert data into the table
131+
INSERT INTO spark_catalog.db.time_travel_example VALUES
132+
(1, 'Mars')
133+
134+
ts1 = now() // A piece of pseudo code
135+
136+
-- Alter table to add a new column
137+
ALTER TABLE spark_catalog.db.time_travel_example ADD COLUMN (price double)
138+
139+
ts2 = now()
140+
141+
-- Insert data into the table
142+
INSERT INTO spark_catalog.db.time_travel_example VALUES (2, 'Venus', 100)
143+
144+
ts3 = now()
145+
146+
-- Query the table at each timestamp
147+
SELECT * FROM spark_catalog.db.time_travel_example TIMESTAMP AS OF ts1;
148+
149+
+------------+------------+
150+
|order_number|product_code|
151+
+------------+------------+
152+
| 1| Mars|
153+
+------------+------------+
154+
155+
156+
SELECT * FROM spark_catalog.db.time_travel_example TIMESTAMP AS OF ts2;
157+
158+
+------------+------------+
159+
|order_number|product_code|
160+
+------------+------------+
161+
| 1| Mars|
162+
+------------+------------+
163+
164+
SELECT * FROM spark_catalog.db.time_travel_example TIMESTAMP AS OF ts3;
165+
166+
+------------+------------+-----+
167+
|order_number|product_code|price|
168+
+------------+------------+-----+
169+
| 1| Mars| NULL|
170+
| 2| Venus|100.0|
171+
+------------+------------+-----+
172+
```
173+
174+
Query results at different timestamps:
175+
176+
- At ts1 & ts2: Only the original two columns appear
177+
- At ts3: All three columns appear, with NULL for the price of the first row
178+
179+
#### Scenario 2: Historical vs. Current Schema Differences {#scenario-2}
180+
181+
182+
A time travel query at a current moment might show a different schema than the current table:
183+
184+
185+
```sql
186+
-- Create a table
187+
CREATE TABLE IF NOT EXISTS spark_catalog.db.time_travel_example_2 (
188+
order_number int,
189+
product_code string
190+
)
191+
USING iceberg
192+
OPTIONS ('format-version'='2')
193+
194+
-- Insert initial data into the table
195+
INSERT INTO spark_catalog.db.time_travel_example_2 VALUES (2, 'Venus');
196+
197+
-- Alter table to add a new column
198+
ALTER TABLE spark_catalog.db.time_travel_example_2 ADD COLUMN (price double);
199+
200+
ts = now();
201+
202+
-- Query the table at a current moment but using timestamp syntax
203+
204+
SELECT * FROM spark_catalog.db.time_travel_example_2 TIMESTAMP AS OF ts;
205+
206+
+------------+------------+
207+
|order_number|product_code|
208+
+------------+------------+
209+
| 2| Venus|
210+
+------------+------------+
211+
212+
-- Query the table at a current moment
213+
SELECT * FROM spark_catalog.db.time_travel_example_2;
214+
215+
216+
+------------+------------+-----+
217+
|order_number|product_code|price|
218+
+------------+------------+-----+
219+
| 2| Venus| NULL|
220+
+------------+------------+-----+
221+
```
222+
223+
This happens because `ALTER TABLE` doesn't create a new snapshot but for the current table Spark takes value of `schema_id` from the latest metadata file, not a snapshot.
224+
225+
#### Scenario 3: Historical vs. Current Schema Differences {#scenario-3}
226+
227+
The second one is that while doing time travel you can't get state of table before any data was written to it:
228+
229+
```sql
230+
-- Create a table
231+
CREATE TABLE IF NOT EXISTS spark_catalog.db.time_travel_example_3 (
232+
order_number int,
233+
product_code string
234+
)
235+
USING iceberg
236+
OPTIONS ('format-version'='2');
237+
238+
ts = now();
239+
240+
-- Query the table at a specific timestamp
241+
SELECT * FROM spark_catalog.db.time_travel_example_3 TIMESTAMP AS OF ts; -- Finises with error: Cannot find a snapshot older than ts.
242+
```
243+
244+
245+
In Clickhouse the behavior is consistent with Spark. You can mentally replace Spark Select queries with Clickhouse Select queries and it will work the same way.
246+
247+
248+
## Data cache {#data-cache}
89249

90250
`Iceberg` table engine and table function support data caching same as `S3`, `AzureBlobStorage`, `HDFS` storages. See [here](../../../engines/table-engines/integrations/s3.md#data-cache).
91251

92-
## See also
252+
## Metadata cache {#metadata-cache}
253+
254+
`Iceberg` table engine and table function support metadata cache storing the information of manifest files, manifest list and metadata json. The cache is stored in memory. This feature is controlled by setting `use_iceberg_metadata_files_cache`, which is enabled by default.
255+
256+
## See also {#see-also}
93257

94258
- [iceberg table function](/docs/sql-reference/table-functions/iceberg.md)

docs/en/sql-reference/statements/system.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,11 @@ For more convenient (automatic) cache management, see disable_internal_dns_cache
9595

9696
Clears the mark cache.
9797

98-
## DROP REPLICA
98+
## DROP ICEBERG METADATA CACHE {#drop-iceberg-metadata-cache}
99+
100+
Clears the iceberg metadata cache.
101+
102+
## DROP REPLICA {#drop-replica}
99103

100104
Dead replicas of `ReplicatedMergeTree` tables can be dropped using following syntax:
101105

0 commit comments

Comments
 (0)