description |
---|
Azure BlobFS Data Connector Documentation |
The Azure BlobFS (ABFS) Data Connector enables federated SQL queries on files stored in Azure Blob-compatible endpoints. This includes Azure BlobFS (abfss://
) and Azure Data Lake (adl://
) endpoints.
When a folder path is provided, all the contained files will be loaded.
File formats are specified using the file_format
parameter, as described in Object Store File Formats.
datasets:
- from: abfs://foocontainer/taxi_sample.csv
name: azure_test
params:
abfs_account: spiceadls
abfs_access_key: ${ secrets:access_key }
file_format: csv
Defines the ABFS-compatible URI to a folder or object:
from: abfs://<container>/<path>
with the account name configured usingabfs_account
parameter, orfrom: abfs://<container>@<account_name>.dfs.core.windows.net/<path>
Defines the dataset name, which is used as the table name within Spice.
Example:
datasets:
- from: abfs://foocontainer/taxi_sample.csv
name: cool_dataset
params: ...
SELECT COUNT(*) FROM cool_dataset;
+----------+
| count(*) |
+----------+
| 6001215 |
+----------+
Parameter name | Description |
---|---|
file_format |
Specifies the data format. Required if not inferrable from from . Options: parquet , csv . Refer to Object Store File Formats for details. |
abfs_account |
Azure storage account name |
abfs_sas_string |
SAS (Shared Access Signature) Token to use for authorization |
abfs_endpoint |
Storage endpoint, default: https://{account}.blob.core.windows.net |
abfs_use_emulator |
Use true or false to connect to a local emulator |
abfs_authority_host |
Alternative authority host, default: https://login.microsoftonline.com |
abfs_proxy_url |
Proxy URL |
abfs_proxy_ca_certificate |
CA certificate for the proxy |
abfs_proxy_exludes |
A list of hosts to exclude from proxy connections |
abfs_disable_tagging |
Disable tagging objects. Use this if your backing store doesn't support tags |
allow_http |
Allow insecure HTTP connections |
hive_partitioning_enabled |
Enable partitioning using hive-style partitioning from the folder structure. Defaults to false |
The following parameters are used when authenticating with Azure. Only one of these parameters can be used at a time:
abfs_access_key
abfs_bearer_token
abfs_client_secret
abfs_skip_signature
If none of these are set the connector will default to using a managed identity
Parameter name | Description |
---|---|
abfs_access_key |
Secret access key |
abfs_bearer_token |
BEARER access token for user authentication. The token can be obtained from the OAuth2 flow (see access token authentication). |
abfs_client_id |
Client ID for client authentication flow |
abfs_client_secret |
Client Secret to use for client authentication flow |
abfs_tenant_id |
Tenant ID to use for client authentication flow |
abfs_skip_signature |
Skip credentials and request signing for public containers |
abfs_msi_endpoint |
Endpoint for managed identity tokens |
abfs_federated_token_file |
File path for federated identity token in Kubernetes |
abfs_use_cli |
Set to true to use the Azure CLI to acquire access tokens |
Parameter name | Description |
---|---|
abfs_max_retries |
Maximum retries |
abfs_retry_timeout |
Total timeout for retries (e.g., 5s , 1m ) |
abfs_backoff_initial_duration |
Initial retry delay (e.g., 5s ) |
abfs_backoff_max_duration |
Maximum retry delay (e.g., 1m ) |
abfs_backoff_base |
Exponential backoff base (e.g., 0.1 ) |
ABFS connector supports three types of authentication, as detailed in the authentication parameters
Configure service principal authentication by setting the abfs_client_secret
parameter.
- Create a new Azure AD application in the Azure portal and generate a
client secret
underCertificates & secrets
. - Grant the Azure AD application read access to the storage account under
Access Control (IAM)
, this can typically be done using theStorage Blob Data Reader
built-in role.
Configure service principal authentication by setting the abfs_access_key
parameter to Azure Storage Account Access Key
Specify the file format using file_format
parameter. More details in Object Store File Formats.
datasets:
- from: abfs://foocontainer/taxi_sample.csv
name: azure_test
params:
abfs_account: spiceadls
abfs_access_key: ${ secrets:ACCESS_KEY }
file_format: csv
datasets:
- from: abfs://pubcontainer/taxi_sample.csv
name: pub_data
params:
abfs_account: spiceadls
abfs_skip_signature: true
file_format: csv
datasets:
- from: abfs://test_container/test_csv.csv
name: test_data
params:
abfs_use_emulator: true
file_format: csv
datasets:
- from: abfs://my_container/my_csv.csv
name: prod_data
params:
abfs_account: ${ secrets:PROD_ACCOUNT }
file_format: csv
datasets:
- from: abfs://my_data/input.parquet
name: my_data
params:
abfs_tenant_id: ${ secrets:MY_TENANT_ID }
abfs_client_id: ${ secrets:MY_CLIENT_ID }
abfs_client_secret: ${ secrets:MY_CLIENT_SECRET }