Skip to content

Latest commit

 

History

History
198 lines (137 loc) · 8.91 KB

File metadata and controls

198 lines (137 loc) · 8.91 KB

Integrate with Snowflake using Apache Iceberg

Export time-series data snapshots from InfluxDB into Apache Iceberg format. Integrate data with Snowflake and other Iceberg-compatible tools without the need for complex ETL processes.

Key Benefits

  • Efficient data access: Query your data directly from Snowflake.
  • Cost-effective storage: Optimize data retention and minimize storage costs.
  • Supports AI and ML workloads: Enhance machine learning applications by making time-series data accessible in Snowflake.

Implementation steps

Follow these steps to integrate InfluxDB 3 with Snowflake using Apache Iceberg:

  1. Configure external storage
  2. Set up a catalog integration in Snowflake
  3. Export InfluxDB data to Iceberg format
  4. Create an Iceberg table in Snowflake
  5. Query your data in Snowflake

Prerequisites

Before you begin, ensure you have the following:

  • A Snowflake account with necessary permissions.
  • Access to an external object store (such as AWS S3).
  • Familiarity with Apache Iceberg and Snowflake.

Configure external storage

Set up an external storage location (such as AWS S3) to store Iceberg table data and metadata.

Example: Configure an S3 stage in Snowflake

CREATE STAGE my_s3_stage 
URL='s3://my-bucket/'
STORAGE_INTEGRATION=my_storage_integration;

For more details, refer to the Snowflake documentation.

Set up a catalog integration in Snowflake

Set up a catalog integration in Snowflake to manage and load Iceberg tables efficiently.

Example: Create a catalog integration in Snowflake

CREATE CATALOG INTEGRATION my_catalog_integration
  CATALOG_SOURCE = 'OBJECT_STORE'
  TABLE_FORMAT = 'ICEBERG'
  ENABLED = TRUE;

For more information, refer to the Snowflake documentation.

Export InfluxDB data to Iceberg format

Use InfluxData's Iceberg exporter to convert and export your time-series data from your {{% product-name omit="Clustered" %}} cluster to the Iceberg table format.

Example: Export data using Iceberg exporter

This example assumes the following:

  • You have followed the example for writing and querying data in the IOx README.
  • You've configured compaction to trigger more quickly with these environment variables:
    • INFLUXDB_IOX_COMPACTION_MIN_NUM_L0_FILES_TO_COMPACT=1
    • INFLUXDB_IOX_COMPACTION_MIN_NUM_L1_FILES_TO_COMPACT=1
  • You have a config.json.

Example config.json

{
    "exports": [
        {
            "namespace": "company_sensors",
            "table_name": "cpu"
        }
    ]
}

Running the export command

$ influxdb_iox iceberg export \
  --catalog-dsn postgresql://postgres@localhost:5432/postgres \
  --source-object-store file 
  --source-data-dir ~/.influxdb_iox/object_store \
  --sink-object-store file \
  --sink-data-dir /tmp/iceberg \
  --export-config-path config.json

The export command outputs an absolute path to an Iceberg metadata file:

/tmp/iceberg/company_sensors/cpu/metadata/v1.metadata.json

Example: Querying the exported metadata using DuckDB

$ duckdb
D SELECT * FROM iceberg_scan('/tmp/iceberg/metadata/v1.metadata.json') LIMIT 1;
┌───────────┬──────────────────────┬─────────────────────┬─────────────┬───┬────────────┬───────────────┬─────────────┬────────────────────┬────────────────────┐
│    cpu    │         host         │        time         │ usage_guest │ … │ usage_nice │ usage_softirq │ usage_steal │    usage_system    │     usage_user     │
│  varchar  │       varchar        │      timestamp      │   double    │   │   double   │    double     │   double    │       double       │       double       │
├───────────┼──────────────────────┼─────────────────────┼─────────────┼───┼────────────┼───────────────┼─────────────┼────────────────────┼────────────────────┤
│ cpu-total │ Andrews-MBP.hsd1.m…  │ 2020-06-11 16:52:00 │         0.0 │ … │        0.0 │           0.0 │         0.0 │ 1.1173184357541899 │ 0.9435133457479826 │
├───────────┴──────────────────────┴─────────────────────┴─────────────┴───┴────────────┴───────────────┴─────────────┴────────────────────┴────────────────────┤
│ 1 rows                                                                                                                                   13 columns (9 shown) │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Next, create an Iceberg table in Snowflake.

Create an Iceberg table in Snowflake

After exporting the data, create an Iceberg table in Snowflake.

Example: Create an Iceberg table in Snowflake

CREATE ICEBERG TABLE my_iceberg_table
  EXTERNAL_VOLUME = 'my_external_volume'
  METADATA_FILE_PATH = 's3://my-bucket/path/to/metadata.json';

Ensure that EXTERNAL_VOLUME and METADATA_FILE_PATH point to your external storage and metadata file.

Query the Iceberg table from Snowflake

Once the Iceberg table is set up, you can query it using standard SQL in Snowflake.

Example: Query the Iceberg table

SELECT * FROM my_iceberg_table
WHERE timestamp > '2025-01-01';

Interfaces for using Iceberg integration

Use the CLI to trigger snapshot exports

Example: Enable Iceberg feature and export a snapshot

# Enable Iceberg feature
$ influxctl enable-iceberg

# Export a snapshot
$ influxctl export --namespace foo --table bar

Use the API to manage and configure snapshots

Use the {{% product-name %}} HTTP API to export snapshots and check status.

Example: Export a snapshot

This example demonstrates how to export a snapshot of your data from InfluxDB to an Iceberg table using the HTTP API.

  • Method: POST
  • Endpoint: /snapshots/export
  • Request body:
{
  "namespace": "foo",
  "table": "bar"
}

The POST request to the /snapshots/export endpoint triggers the export of data from the specified namespace and table in InfluxDB to an Iceberg table. The request body specifies the namespace (foo) and the table (bar) to be exported.

Example: Check snapshot status

This example shows how to check the status of an ongoing or completed snapshot export using the HTTP API.

  • Method: GET
  • Endpoint: /snapshots/status

The GET request to the /snapshots/status endpoint retrieves the status of the snapshot export. This can be used to monitor the progress of the export or verify its completion.

Considerations and limitations

When exporting data from InfluxDB to an Iceberg table, keep the following considerations and limitations in mind:

  • Data consistency: Ensure that the exported data in the Iceberg table is consistent with the source data in InfluxDB.
  • Performance: Query performance may vary based on data size and query complexity.
  • Feature support: Some advanced features of InfluxDB may not be fully supported in Snowflake through Iceberg integration.