Skip to content

Iceberg Integration documentation #5918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 68 commits into
base: master
Choose a base branch
from
Draft
Changes from 30 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
d92a19c
Iceberg Integration
MeelahMe Mar 19, 2025
5d7bc7f
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 19, 2025
f873263
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 19, 2025
8391ce7
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 19, 2025
8b454e0
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 19, 2025
57c024c
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 19, 2025
9563143
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 19, 2025
3f82fb6
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 19, 2025
0478e05
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 19, 2025
4a1f036
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 19, 2025
091c315
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 19, 2025
a32d724
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 20, 2025
d22f3fb
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 20, 2025
0a5d3f5
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 20, 2025
7bc3352
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 20, 2025
22c19be
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 20, 2025
e9345af
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 20, 2025
4693ece
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 20, 2025
8ea52b7
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 20, 2025
836e31b
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 20, 2025
a4eefaa
Merge branch 'master' into inceberg
jstirnaman Mar 20, 2025
3283130
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
6c55f87
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
7c6a630
updating headings
MeelahMe Mar 21, 2025
d353d01
Adding a numbered-list TOC
MeelahMe Mar 21, 2025
677ed17
Added an export command example
MeelahMe Mar 21, 2025
accaffe
Proof read: improving grammar and clarity
MeelahMe Mar 21, 2025
8b8257b
Adding explanations for examples
MeelahMe Mar 21, 2025
ffc8014
Updating examples
MeelahMe Mar 21, 2025
bb7a0d3
Merge branch 'master' into inceberg
MeelahMe Mar 21, 2025
05d4858
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
aa6b36a
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
7e260b5
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
bd472c8
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
1ad0ffa
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
b022d25
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
b712851
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
c5d58ee
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
9adfbda
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
37888c0
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
4e6e034
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
e79725d
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
a4e2b80
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
MeelahMe Mar 21, 2025
aebe441
removing run export example and adding
MeelahMe Mar 26, 2025
277fcde
Changes to- Example: Export data using Iceberg exporter
MeelahMe Mar 26, 2025
07f6551
Adding note to- Export InfluxDB time series data to Iceberg format
MeelahMe Mar 26, 2025
4b2ee68
removing reduandant sections and information
MeelahMe Mar 26, 2025
0c31daa
removing IOX references
MeelahMe Mar 26, 2025
6cd3c82
Rewriting section to focus and clarify that Iceberg table is done wit…
MeelahMe Mar 26, 2025
8adec72
removing entier section: Interfaces for using Iceberg integration sec…
MeelahMe Mar 26, 2025
92208e4
Adding read-only note to section
MeelahMe Mar 26, 2025
c9e26fa
updating TOC numbe4r list
MeelahMe Mar 26, 2025
cb02320
Updating who to contact for Snowflake Integration
MeelahMe Mar 26, 2025
4d259fb
Restructuring, reving, and proof reading edit
MeelahMe Mar 26, 2025
a8badd6
google dev style read through
MeelahMe Mar 26, 2025
83b6570
updating prereqs
MeelahMe Mar 26, 2025
3b56fee
Merge branch 'master' into inceberg
MeelahMe Mar 26, 2025
6eed23c
wip: removing tab shortcode to figure out error
MeelahMe Mar 26, 2025
6c09ef4
Merge branch 'iceberg' of github.com:influxdata/docs-v2 into iceberg
MeelahMe Mar 26, 2025
a013770
adding shortcode structure back
MeelahMe Mar 26, 2025
905edc3
feat(dedicated): iceberg export for snowflake:
jstirnaman Mar 27, 2025
b4b4a7e
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
jstirnaman Mar 27, 2025
c655971
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
jstirnaman Mar 27, 2025
6034b91
Update content/shared/influxdb3-query-guides/snapshots/snowflake.md
jstirnaman Mar 27, 2025
b11788d
Apply suggestions from code review
jstirnaman Mar 27, 2025
6af2b3f
Merge pull request #5931 from influxdata/jts/iceberg
jstirnaman Mar 27, 2025
f5d465b
feat(dedicated): iceberg for snowflake:
jstirnaman Mar 27, 2025
3a44b1e
Merge branch 'master' into inceberg
MeelahMe Mar 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 198 additions & 0 deletions content/shared/influxdb3-query-guides/snapshots/snowflake.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
# Integrate with Snowflake using Apache Iceberg

Export time-series data snapshots from InfluxDB into Apache Iceberg format.
Integrate data with Snowflake and other Iceberg-compatible tools without the need for complex ETL processes.

### Key Benefits

- **Efficient data access**: Query your data directly from Snowflake.
- **Cost-effective storage**: Optimize data retention and minimize storage costs.
- **Supports AI and ML workloads**: Enhance machine learning applications by making time-series data accessible in Snowflake.

## Implementation steps

Follow these steps to integrate InfluxDB 3 with Snowflake using Apache Iceberg:

1. [Configure external storage](#configure-external-storage)
2. [Set up a catalog integration in Snowflake](#set-up-a-catalog-integration-in-snowflake)
3. [Export InfluxDB data to Iceberg format](#export-influxdb-data-to-iceberg-format)
4. [Create an Iceberg table in Snowflake](#create-an-iceberg-table-in-snowflake)
5. [Query your data in Snowflake](#query-your-data-in-snowflake)

## Prerequisites

Before you begin, ensure you have the following:

- A **Snowflake account** with necessary permissions.
- Access to an **external object store** (such as AWS S3).
- Familiarity with **Apache Iceberg** and **Snowflake**.


## Configure external storage

Set up an external storage location (such as AWS S3) to store Iceberg table data and metadata.

### Example: Configure an S3 stage in Snowflake

```sql
CREATE STAGE my_s3_stage
URL='s3://my-bucket/'
STORAGE_INTEGRATION=my_storage_integration;
```

For more details, refer to the [Snowflake documentation](https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-catalog-integration-object-storage).

## Set up a catalog integration in Snowflake

Set up a catalog integration in Snowflake to manage and load Iceberg tables efficiently.

### Example: Create a catalog integration in Snowflake

```sql
CREATE CATALOG INTEGRATION my_catalog_integration
CATALOG_SOURCE = 'OBJECT_STORE'
TABLE_FORMAT = 'ICEBERG'
ENABLED = TRUE;
```

For more information, refer to the [Snowflake documentation](https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-catalog-integration).

## Export InfluxDB data to Iceberg format

Use InfluxData's Iceberg exporter to convert and export your time-series data from your {{% product-name omit="Clustered" %}} cluster to the Iceberg table format.

### Example: Export data using Iceberg exporter

This example assumes the following:

- You have followed the example for [writing and querying data in the IOx README](https://github.com/influxdata/influxdb_iox/blob/main/README.md#write-and-read-data).
- You've configured compaction to trigger more quickly with these environment variables:
- `INFLUXDB_IOX_COMPACTION_MIN_NUM_L0_FILES_TO_COMPACT=1`
- `INFLUXDB_IOX_COMPACTION_MIN_NUM_L1_FILES_TO_COMPACT=1`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the user instructed to do this? For Cloud Dedicated, they can't. This is something we have to do for them. They would be able to do this in Clustered, but we never tell them to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just internal testing instructions, lets not include these.

- You have a `config.json`.

#### Example `config.json`

```json
{
"exports": [
{
"namespace": "company_sensors",
"table_name": "cpu"
}
]
}
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't assume they've done this. If this is necessary, we need to tell them to create a config.json and then where to put it.


#### Running the export command

```console
$ influxdb_iox iceberg export \
--catalog-dsn postgresql://postgres@localhost:5432/postgres \
--source-object-store file
--source-data-dir ~/.influxdb_iox/object_store \
--sink-object-store file \
--sink-data-dir /tmp/iceberg \
--export-config-path config.json
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ritwika314 influxdb_iox is not something we've encouraged users to use. All client-facing cluster management is done through influxctl. Are we going to add anything to influxctl to export Iceberg tables for Clustered customers?

@MeelahMe This example appears to be for a local test bed and is pulling data from the filesystem. We need to make sure the example is similar to what a customer would run in production, where the source object store is a hosted object store.

Copy link

@ritwika314 ritwika314 Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No these are local testing instructions . i am going through this PR now, sorry i was buried under a lot of notifications and had not noticed this. I will provide detailed feedback by EOD.


The export command outputs an absolute path to an Iceberg metadata file:

`/tmp/iceberg/company_sensors/cpu/metadata/v1.metadata.json
`
#### Example: Querying the exported metadata using DuckDB

```console
$ duckdb
D SELECT * FROM iceberg_scan('/tmp/iceberg/metadata/v1.metadata.json') LIMIT 1;
┌───────────┬──────────────────────┬─────────────────────┬─────────────┬───┬────────────┬───────────────┬─────────────┬────────────────────┬────────────────────┐
│ cpu │ host │ time │ usage_guest │ … │ usage_nice │ usage_softirq │ usage_steal │ usage_system │ usage_user │
│ varchar │ varchar │ timestamp │ double │ │ double │ double │ double │ double │ double │
├───────────┼──────────────────────┼─────────────────────┼─────────────┼───┼────────────┼───────────────┼─────────────┼────────────────────┼────────────────────┤
│ cpu-total │ Andrews-MBP.hsd1.m… │ 2020-06-11 16:52:00 │ 0.0 │ … │ 0.0 │ 0.0 │ 0.0 │ 1.1173184357541899 │ 0.9435133457479826 │
├───────────┴──────────────────────┴─────────────────────┴─────────────┴───┴────────────┴───────────────┴─────────────┴────────────────────┴────────────────────┤
│ 1 rows 13 columns (9 shown) │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

Next, create an Iceberg table in Snowflake.

### Create an Iceberg table in Snowflake

After exporting the data, create an Iceberg table in Snowflake.

#### Example: Create an Iceberg table in Snowflake

```sql
CREATE ICEBERG TABLE my_iceberg_table
EXTERNAL_VOLUME = 'my_external_volume'
METADATA_FILE_PATH = 's3://my-bucket/path/to/metadata.json';
```

Ensure that `EXTERNAL_VOLUME` and `METADATA_FILE_PATH` point to your external storage and metadata file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would describe these in separate bullet points.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a little more about what you mean?


## Query the Iceberg table from Snowflake

Once the Iceberg table is set up, you can query it using standard SQL in Snowflake.

### Example: Query the Iceberg table

```sql
SELECT * FROM my_iceberg_table
WHERE timestamp > '2025-01-01';
```

## Interfaces for using Iceberg integration

- [Use the CLI to trigger snapshot exports](#use-the-CLI-to-trigger-snapshot-exports)
- [Use the API to manage and configure snapshots](#use-the-api-to-manage-and-configure-snapshots)
- [Use SQL in Snowflake to query Iceberg tables](#use-sql-in-snowflake-to-query-iceberg-tables)

### Use the CLI to trigger snapshot exports

#### Example: Enable Iceberg feature and export a snapshot

```sh
# Enable Iceberg feature
$ influxctl enable-iceberg

# Export a snapshot
$ influxctl export --namespace foo --table bar
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commands should be used in the process above, not influxdb_iox


### Use the API to manage and configure snapshots

Use the {{% product-name %}} HTTP API to export snapshots and check status.

#### Example: Export a snapshot

This example demonstrates how to export a snapshot of your data from InfluxDB to an Iceberg table using the HTTP API.

- **Method**: `POST`
- **Endpoint**: `/snapshots/export`
- **Request body**:

```json
{
"namespace": "foo",
"table": "bar"
}
```
The `POST` request to the `/snapshots/export` endpoint triggers the export of data from the specified namespace and table in InfluxDB to an Iceberg table. The request body specifies the namespace (`foo`) and the table (`bar`) to be exported.

#### Example: Check snapshot status

This example shows how to check the status of an ongoing or completed snapshot export using the HTTP API.

- **Method**: `GET`
- **Endpoint**: `/snapshots/status`

The `GET` request to the `/snapshots/status` endpoint retrieves the status of the snapshot export. This can be used to monitor the progress of the export or verify its completion.

## Considerations and limitations

When exporting data from InfluxDB to an Iceberg table, keep the following considerations and limitations in mind:

- **Data consistency**: Ensure that the exported data in the Iceberg table is consistent with the source data in InfluxDB.
- **Performance**: Query performance may vary based on data size and query complexity.
- **Feature support**: Some advanced features of InfluxDB may not be fully supported in Snowflake through Iceberg integration.