Skip to content

Commit 619d9be

Browse files
authored
Merge branch 'release/1.3.0' into 497-enhance-databricks-catalog-connector-documentation
2 parents ea77d3c + e6fb965 commit 619d9be

5 files changed

Lines changed: 235 additions & 62 deletions

File tree

website/docs/components/catalogs/databricks.md

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ tags:
1111
- data-connectors
1212
---
1313

14-
Connect to a [Databricks Unity Catalog](https://www.databricks.com/product/unity-catalog) as a catalog provider for federated SQL query using [Spark Connect](https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html) or directly from [Delta Lake](https://delta.io/) tables.
14+
Connect to a [Databricks Unity Catalog](https://www.databricks.com/product/unity-catalog) as a catalog provider for federated SQL query using [Spark Connect](https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html), directly from [Delta Lake](https://delta.io/) tables, or using the [SQL Statement Execution API](https://docs.databricks.com/aws/en/dev-tools/sql-execution-tutorial).
1515

1616
## Configuration
1717

@@ -22,7 +22,7 @@ catalogs:
2222
include:
2323
- '*.my_table_name' # include only the "my_table_name" tables
2424
params:
25-
mode: delta_lake # or spark_connect
25+
mode: delta_lake # or spark_connect or sql_warehouse
2626
databricks_endpoint: dbc-a12cd3e4-56f7.cloud.databricks.com
2727
dataset_params:
2828
# delta_lake S3 parameters
@@ -32,6 +32,8 @@ catalogs:
3232
databricks_aws_endpoint: s3.us-west-2.amazonaws.com
3333
# spark_connect parameters
3434
databricks_cluster_id: 1234-567890-abcde123
35+
# sql_warehouse parameters
36+
databricks_sql_warehouse_id: 2b4e24cff378fb24
3537
```
3638
3739
## `from`
@@ -57,6 +59,14 @@ The following parameters are supported for configuring the connection to the Dat
5759
| `databricks_token` | The Databricks API token to authenticate with the Unity Catalog API. Use the [secret replacement syntax](../secret-stores/index.md) to reference a secret, e.g. `${secrets:my_databricks_token}`. |
5860
| `databricks_use_ssl` | If true, use a TLS connection to connect to the Databricks endpoint. Default is `true`. |
5961

62+
To locate the Databricks endpoint, do the following:
63+
64+
1. Log in to your Databricks workspace.
65+
2. In the sidebar, click Compute.
66+
3. In the list of available clusters, click the target cluster's name.
67+
4. On the Configuration tab, expand Advanced options.
68+
5. Click the JDBC/ODBC tab.
69+
6. The endpoint is the Server Hostname.
6070

6171
## Authentication
6272

@@ -105,10 +115,30 @@ The `dataset_params` field is used to configure the dataset-specific parameters
105115
|------------------------|------------|
106116
| `databricks_cluster_id` | The ID of the compute cluster in Databricks to use for the query. e.g. `1234-567890-abcde123`. |
107117

118+
To locate the cluster ID, do the following:
119+
120+
1. Log in to your Databricks workspace.
121+
2. In the sidebar, click Compute.
122+
3. In the list of available clusters, click the target cluster's name.
123+
4. On the Configuration tab, expand Advanced options.
124+
5. Click the JDBC/ODBC tab.
125+
6. The cluster ID is the prefix of the Server Hostname.
126+
108127
### Delta Lake object store parameters
109128

110129
Configure the connection to the object store when using `mode: delta_lake`. Use the [secret replacement syntax](../secret-stores/index.md) to reference a secret, e.g. `${secrets:aws_access_key_id}`.
111130

131+
### SQL Warehouse parameters
132+
133+
- `databricks_sql_warehouse_id`: The ID of the SQL Warehouse in Databricks to use for the query. e.g. `2b4e24cff378fb24`.
134+
135+
To locate your SQL Warehouse ID, do the following:
136+
137+
1. Log in to your Databricks workspace.
138+
2. In the sidebar, click SQL -> SQL Warehouses.
139+
3. In the list of available warehouses, click the target warehouse's name.
140+
4. Next to the **Name** field, the ID follows the name in parentheses. For example: `My Serverless Warehouse (ID: 2b4e24cff378fb24)`
141+
112142
#### AWS S3
113143

114144
| Dataset Parameter Name | Definition |

website/docs/components/data-connectors/databricks.md

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ tags:
99
- delta-lake
1010
---
1111

12-
Databricks as a connector for federated SQL query against Databricks using [Spark Connect](https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html) or directly from [Delta Lake](https://delta.io/) tables.
12+
Databricks as a connector for federated SQL query against Databricks using [Spark Connect](https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html), directly from [Delta Lake](https://delta.io/) tables, or using the [SQL Statement Execution API](https://docs.databricks.com/aws/en/dev-tools/sql-execution-tutorial).
1313

1414
```yaml
1515
datasets:
@@ -62,6 +62,7 @@ Use the [secret replacement syntax](../secret-stores/index.md) to reference a se
6262
| -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
6363
| `mode` | The execution mode for querying against Databricks. The default is `spark_connect`. Possible values:<br /> <ul><li>`spark_connect`: Use Spark Connect to query against Databricks. Requires a Spark cluster to be available.</li><li>`delta_lake`: Query directly from Delta Tables. Requires the object store credentials to be provided.</li></ul> |
6464
| `databricks_endpoint` | The endpoint of the Databricks instance. Required for both modes. |
65+
| `databricks_sql_warehouse_id` | The ID of the SQL Warehouse in Databricks to use for the query. Only valid when `mode` is `sql_warehouse`. |
6566
| `databricks_cluster_id` | The ID of the compute cluster in Databricks to use for the query. Only valid when `mode` is `spark_connect`. |
6667
| `databricks_use_ssl` | If true, use a TLS connection to connect to the Databricks endpoint. Default is `true`. |
6768
| `client_timeout` | Optional. Applicable only in `delta_lake` mode. Specifies timeout for object store operations. Default value is `30s` E.g. `client_timeout: 60s` |
@@ -157,6 +158,18 @@ Configure the connection to the object store when using `mode: delta_lake`. Use
157158
databricks_token: ${secrets:my_token}
158159
```
159160

161+
### SQL Warehouse
162+
163+
```yaml
164+
- from: databricks:spiceai.datasets.my_table # A reference to a table in the Databricks unity catalog
165+
name: my_table
166+
params:
167+
mode: sql_warehouse
168+
databricks_endpoint: dbc-a1b2345c-d6e7.cloud.databricks.com
169+
databricks_sql_warehouse_id: 2b4e24cff378fb24
170+
databricks_token: ${secrets:my_token}
171+
```
172+
160173
### Delta Lake (S3)
161174

162175
```yaml
@@ -259,4 +272,4 @@ Memory limitations can be mitigated by storing acceleration data on disk, which
259272

260273
## Cookbook
261274

262-
- A cookbook recipe to configure Databricks as data connector in Spice under `delta_lake` mode. [Spice on Databricks (mode: delta_lake)](https://github.com/spiceai/cookbook/tree/trunk/databricks/delta_lake#readme)
275+
- A cookbook recipe to configure Databricks as a data connector in Spice. [Spice on Databricks](https://github.com/spiceai/cookbook/tree/trunk/databricks)

0 commit comments

Comments
 (0)