Skip to content

Latest commit

 

History

History
175 lines (133 loc) · 6.79 KB

delta-lake.md

File metadata and controls

175 lines (133 loc) · 6.79 KB
description tags
Delta Lake Data Connector Documentation
data-connectors
delta-lake
data-lake

Delta Lake Data Connector

Delta Lake data connector connector enables SQL queries from Delta Lake tables.

datasets:
  - from: delta_lake:s3://my_bucket/path/to/s3/delta/table/
    name: my_delta_lake_table
    params:
      delta_lake_aws_access_key_id: ${secrets:aws_access_key_id}
      delta_lake_aws_secret_access_key: ${secrets:aws_secret_access_key}

Configuration

from

The from field for the Delta Lake connector takes the form of delta_lake:path where path is any supported path, either local or to a cloud storage location. See the examples section below.

name

The dataset name. This will be used as the table name within Spice.

Example:

datasets:
  - from: delta_lake:s3://my_bucket/path/to/s3/delta/table/
    name: cool_dataset
    params: ...
SELECT COUNT(*) FROM cool_dataset;
+----------+
| count(*) |
+----------+
| 6001215  |
+----------+

params

Use the secret replacement syntax to reference a secret, e.g. ${secrets:aws_access_key_id}.

Parameter Name Description
client_timeout Optional. Specifies timeout for object store operations. Default value is 30s. E.g. client_timeout: 60s

Delta Lake object store parameters

AWS S3

Parameter Name Description
delta_lake_aws_region Optional. The AWS region for the S3 object store. E.g. us-west-2.
delta_lake_aws_access_key_id The access key ID for the S3 object store.
delta_lake_aws_secret_access_key The secret access key for the S3 object store.
delta_lake_aws_endpoint Optional. The endpoint for the S3 object store. E.g. s3.us-west-2.amazonaws.com.

Azure Blob

{% hint style="info" %} Note One of the following auth values must be provided for Azure Blob:

  • delta_lake_azure_storage_account_key,
  • delta_lake_azure_storage_client_id and azure_storage_client_secret, or
  • delta_lake_azure_storage_sas_key. {% endhint %}
Parameter Name Description
delta_lake_azure_storage_account_name The Azure Storage account name.
delta_lake_azure_storage_account_key The Azure Storage master key for accessing the storage account.
delta_lake_azure_storage_client_id The service principal client id for accessing the storage account.
delta_lake_azure_storage_client_secret The service principal client secret for accessing the storage account.
delta_lake_azure_storage_sas_key The shared access signature key for accessing the storage account.
delta_lake_azure_storage_endpoint Optional. The endpoint for the Azure Blob storage account.

Google Storage (GCS)

Parameter Name Description
google_service_account Filesystem path to the Google service account JSON key file.

Examples

Delta Lake + Local

- from: delta_lake:/path/to/local/delta/table # A local filesystem path to a Delta Lake table
  name: my_delta_lake_table

Delta Lake + S3

- from: delta_lake:s3://my_bucket/path/to/s3/delta/table/ # A reference to a table in S3
  name: my_delta_lake_table
  params:
    delta_lake_aws_region: us-west-2 # Optional
    delta_lake_aws_access_key_id: ${secrets:aws_access_key_id}
    delta_lake_aws_secret_access_key: ${secrets:aws_secret_access_key}
    delta_lake_aws_endpoint: s3.us-west-2.amazonaws.com # Optional

Delta Lake + Azure Blob

- from: delta_lake:abfss://my_container@my_account.dfs.core.windows.net/path/to/azure/delta/table/ # A reference to a table in Azure Blob
  name: my_delta_lake_table
  params:
    # Account Name + Key
    delta_lake_azure_storage_account_name: my_account
    delta_lake_azure_storage_account_key: ${secrets:my_key}

    # OR Service Principal + Secret
    delta_lake_azure_storage_client_id: my_client_id
    delta_lake_azure_storage_client_secret: ${secrets:my_secret}

    # OR SAS Key
    delta_lake_azure_storage_sas_key: my_sas_key

Delta Lake + Google Storage

params:
  delta_lake_google_service_account_path: /path/to/service-account.json

Types

The table below shows the Delta Lake data types supported, along with the type mapping to Apache Arrow types in Spice.

Delta Lake Type Arrow Type
String Utf8
Long Int64
Integer Int32
Short Int16
Byte Int8
Float Float32
Double Float64
Boolean Boolean
Binary Binary
Date Date32
Timestamp Timestamp(Microsecond, Some("UTC"))
TimestampNtz Timestamp(Microsecond, None)
Decimal Decimal128
Array List
Struct Struct
Map Map

Limitations

  • Delta Lake connector does not support reading Delta tables with the V2Checkpoint feature enabled. To use the Delta Lake connector with such tables, drop the V2Checkpoint feature by executing the following command:

    ALTER TABLE <table-name> DROP FEATURE v2Checkpoint [TRUNCATE HISTORY];

    For more details on dropping Delta table features, refer to the official documentation: Drop Delta table features