Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions website/docs/components/catalogs/databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ tags:
- data-connectors
---

Connect to a [Databricks Unity Catalog](https://www.databricks.com/product/unity-catalog) as a catalog provider for federated SQL query using [Spark Connect](https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html) or directly from [Delta Lake](https://delta.io/) tables.
Connect to a [Databricks Unity Catalog](https://www.databricks.com/product/unity-catalog) as a catalog provider for federated SQL query using [Spark Connect](https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html), directly from [Delta Lake](https://delta.io/) tables, or using the [SQL Statement Execution API](https://docs.databricks.com/aws/en/dev-tools/sql-execution-tutorial).

## Configuration

Expand All @@ -22,7 +22,7 @@ catalogs:
include:
- '*.my_table_name' # include only the "my_table_name" tables
params:
mode: delta_lake # or spark_connect
mode: delta_lake # or spark_connect or sql_warehouse
databricks_endpoint: dbc-a12cd3e4-56f7.cloud.databricks.com
dataset_params:
# delta_lake S3 parameters
Expand All @@ -32,6 +32,8 @@ catalogs:
databricks_aws_endpoint: s3.us-west-2.amazonaws.com
# spark_connect parameters
databricks_cluster_id: 1234-567890-abcde123
# sql_warehouse parameters
databricks_sql_warehouse_id: 1234-567890-abcde123
```

## `from`
Expand All @@ -53,6 +55,7 @@ The `params` field is used to configure the connection to the Databricks Unity C
- `mode`: The execution mode for querying against Databricks. The default is `spark_connect`. Possible values:
- `spark_connect`: Use Spark Connect to query against Databricks. Requires a Spark cluster to be available.
- `delta_lake`: Query directly from Delta Tables. Requires the object store credentials to be provided.
- `sql_warehouse`: Use SQL Statement Execution API to query against a Databricks SQL Warehouse.
Comment thread
kczimm marked this conversation as resolved.
- `databricks_endpoint`: The Databricks workspace endpoint, e.g. `dbc-a12cd3e4-56f7.cloud.databricks.com`.
- `databricks_token`: The Databricks API token to authenticate with the Unity Catalog API. Use the [secret replacement syntax](../secret-stores/index.md) to reference a secret, e.g. `${secrets:my_databricks_token}`.
- `databricks_use_ssl`: If true, use a TLS connection to connect to the Databricks endpoint. Default is `true`.
Expand Down Expand Up @@ -106,6 +109,10 @@ The `dataset_params` field is used to configure the dataset-specific parameters

Configure the connection to the object store when using `mode: delta_lake`. Use the [secret replacement syntax](../secret-stores/index.md) to reference a secret, e.g. `${secrets:aws_access_key_id}`.

### SQL Warehouse parameters

- `databricks_sql_warehouse_id`: The ID of the SQL Warehouse in Databricks to use for the query. e.g. `1234-567890-abcde123`.

#### AWS S3

- `databricks_aws_region`: The AWS region for the S3 object store. E.g. `us-west-2`.
Expand Down
17 changes: 15 additions & 2 deletions website/docs/components/data-connectors/databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ tags:
- delta-lake
---

Databricks as a connector for federated SQL query against Databricks using [Spark Connect](https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html) or directly from [Delta Lake](https://delta.io/) tables.
Databricks as a connector for federated SQL query against Databricks using [Spark Connect](https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html), directly from [Delta Lake](https://delta.io/) tables, or using the [SQL Statement Execution API](https://docs.databricks.com/aws/en/dev-tools/sql-execution-tutorial).

```yaml
datasets:
Expand Down Expand Up @@ -62,6 +62,7 @@ Use the [secret replacement syntax](../secret-stores/index.md) to reference a se
| -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `mode` | The execution mode for querying against Databricks. The default is `spark_connect`. Possible values:<br /> <ul><li>`spark_connect`: Use Spark Connect to query against Databricks. Requires a Spark cluster to be available.</li><li>`delta_lake`: Query directly from Delta Tables. Requires the object store credentials to be provided.</li></ul> |
| `databricks_endpoint` | The endpoint of the Databricks instance. Required for both modes. |
| `databricks_sql_warehouse_id` | The ID of the SQL Warehouse in Databricks to use for the query. Only valid when `mode` is `sql_warehouse`. |
| `databricks_cluster_id` | The ID of the compute cluster in Databricks to use for the query. Only valid when `mode` is `spark_connect`. |
| `databricks_use_ssl` | If true, use a TLS connection to connect to the Databricks endpoint. Default is `true`. |
| `client_timeout` | Optional. Applicable only in `delta_lake` mode. Specifies timeout for object store operations. Default value is `30s` E.g. `client_timeout: 60s` |
Expand Down Expand Up @@ -157,6 +158,18 @@ Configure the connection to the object store when using `mode: delta_lake`. Use
databricks_token: ${secrets:my_token}
```

### SQL Warehouse

```yaml
- from: databricks:spiceai.datasets.my_table # A reference to a table in the Databricks unity catalog
name: my_table
params:
mode: sql_warehouse
databricks_endpoint: dbc-a1b2345c-d6e7.cloud.databricks.com
databricks_sql_warehouse_id: 1234-567890-abcde123
databricks_token: ${secrets:my_token}
```

### Delta Lake (S3)

```yaml
Expand Down Expand Up @@ -259,4 +272,4 @@ Memory limitations can be mitigated by storing acceleration data on disk, which

## Cookbook

- A cookbook recipe to configure Databricks as data connector in Spice under `delta_lake` mode. [Spice on Databricks (mode: delta_lake)](https://github.com/spiceai/cookbook/tree/trunk/databricks/delta_lake#readme)
- A cookbook recipe to configure Databricks as a data connector in Spice. [Spice on Databricks](https://github.com/spiceai/cookbook/tree/trunk/databricks)
Loading