Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 28 additions & 19 deletions website/docs/components/catalogs/databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,15 @@ Use the `include` field to specify which tables to include from the catalog. The

## `params`

The `params` field is used to configure the connection to the Databricks Unity Catalog. The following parameters are supported:
The following parameters are supported for configuring the connection to the Databricks Unity Catalog:

| Parameter Name | Definition |
|---------------|------------|
Comment thread
kczimm marked this conversation as resolved.
Outdated
| `mode` | The execution mode for querying against Databricks. `spark_connect` uses Spark Connect to query against Databricks requires a Spark cluster to be available. `delta_lake` queries directly from Delta Tables and requires the object store credentials to be provided. Default is `spark_connect`. |
| `databricks_endpoint` | The Databricks workspace endpoint, e.g. `dbc-a12cd3e4-56f7.cloud.databricks.com` |
| `databricks_token` | The Databricks API token to authenticate with the Unity Catalog API. Use the [secret replacement syntax](../secret-stores/index.md) to reference a secret, e.g. `${secrets:my_databricks_token}`. |
| `databricks_use_ssl` | If true, use a TLS connection to connect to the Databricks endpoint. Default is `true`. |

- `mode`: The execution mode for querying against Databricks. The default is `spark_connect`. Possible values:
- `spark_connect`: Use Spark Connect to query against Databricks. Requires a Spark cluster to be available.
- `delta_lake`: Query directly from Delta Tables. Requires the object store credentials to be provided.
- `databricks_endpoint`: The Databricks workspace endpoint, e.g. `dbc-a12cd3e4-56f7.cloud.databricks.com`.
- `databricks_token`: The Databricks API token to authenticate with the Unity Catalog API. Use the [secret replacement syntax](../secret-stores/index.md) to reference a secret, e.g. `${secrets:my_databricks_token}`.
- `databricks_use_ssl`: If true, use a TLS connection to connect to the Databricks endpoint. Default is `true`.

## Authentication

Expand Down Expand Up @@ -100,18 +101,22 @@ The `dataset_params` field is used to configure the dataset-specific parameters

### Spark Connect parameters

- `databricks_cluster_id`: The ID of the compute cluster in Databricks to use for the query. e.g. `1234-567890-abcde123`.
| Dataset Parameter Name | Definition |
|------------------------|------------|
| `databricks_cluster_id` | The ID of the compute cluster in Databricks to use for the query. e.g. `1234-567890-abcde123`. |

### Delta Lake object store parameters

Configure the connection to the object store when using `mode: delta_lake`. Use the [secret replacement syntax](../secret-stores/index.md) to reference a secret, e.g. `${secrets:aws_access_key_id}`.

#### AWS S3

- `databricks_aws_region`: The AWS region for the S3 object store. E.g. `us-west-2`.
- `databricks_aws_access_key_id`: The access key ID for the S3 object store.
- `databricks_aws_secret_access_key`: The secret access key for the S3 object store.
- `databricks_aws_endpoint`: The endpoint for the S3 object store. E.g. `s3.us-west-2.amazonaws.com`.
| Dataset Parameter Name | Definition |
|------------------------|------------|
| `databricks_aws_region` | The AWS region for the S3 object store. E.g. `us-west-2`. |
| `databricks_aws_access_key_id` | The access key ID for the S3 object store. |
| `databricks_aws_secret_access_key` | The secret access key for the S3 object store. |
| `databricks_aws_endpoint` | The endpoint for the S3 object store. E.g. `s3.us-west-2.amazonaws.com`. |

Example:

Expand Down Expand Up @@ -141,12 +146,14 @@ One of the following auth values must be provided for Azure Blob:
- `databricks_azure_storage_sas_key`.
:::

- `databricks_azure_storage_account_name`: The Azure Storage account name.
- `databricks_azure_storage_account_key`: The Azure Storage master key for accessing the storage account.
- `databricks_azure_storage_client_id`: The service principal client id for accessing the storage account.
- `databricks_azure_storage_client_secret`: The service principal client secret for accessing the storage account.
- `databricks_azure_storage_sas_key`: The shared access signature key for accessing the storage account.
- `databricks_azure_storage_endpoint`: The endpoint for the Azure Blob storage account.
| Dataset Parameter Name | Definition |
|------------------------|------------|
| `databricks_azure_storage_account_name` | The Azure Storage account name. |
| `databricks_azure_storage_account_key` | The Azure Storage master key for accessing the storage account. |
| `databricks_azure_storage_client_id` | The service principal client id for accessing the storage account. |
| `databricks_azure_storage_client_secret` | The service principal client secret for accessing the storage account. |
| `databricks_azure_storage_sas_key` | The shared access signature key for accessing the storage account. |
| `databricks_azure_storage_endpoint` | The endpoint for the Azure Blob storage account. |

Example:

Expand All @@ -167,7 +174,9 @@ catalogs:

#### Google Storage (GCS)

- `google_service_account`: Filesystem path to the Google service account JSON key file.
| Dataset Parameter Name | Definition |
|------------------------|------------|
| `google_service_account` | Filesystem path to the Google service account JSON key file. |

Example:

Expand Down