You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/docs/components/catalogs/databricks.md
+32-2Lines changed: 32 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ tags:
11
11
- data-connectors
12
12
---
13
13
14
-
Connect to a [Databricks Unity Catalog](https://www.databricks.com/product/unity-catalog) as a catalog provider for federated SQL query using [Spark Connect](https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html) or directly from [Delta Lake](https://delta.io/) tables.
14
+
Connect to a [Databricks Unity Catalog](https://www.databricks.com/product/unity-catalog) as a catalog provider for federated SQL query using [Spark Connect](https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html), directly from [Delta Lake](https://delta.io/) tables, or using the [SQL Statement Execution API](https://docs.databricks.com/aws/en/dev-tools/sql-execution-tutorial).
15
15
16
16
## Configuration
17
17
@@ -22,7 +22,7 @@ catalogs:
22
22
include:
23
23
- '*.my_table_name'# include only the "my_table_name" tables
24
24
params:
25
-
mode: delta_lake # or spark_connect
25
+
mode: delta_lake # or spark_connect or sql_warehouse
@@ -57,6 +59,14 @@ The following parameters are supported for configuring the connection to the Dat
57
59
| `databricks_token` | The Databricks API token to authenticate with the Unity Catalog API. Use the [secret replacement syntax](../secret-stores/index.md) to reference a secret, e.g. `${secrets:my_databricks_token}`. |
58
60
| `databricks_use_ssl` | If true, use a TLS connection to connect to the Databricks endpoint. Default is `true`. |
59
61
62
+
To locate the Databricks endpoint, do the following:
63
+
64
+
1. Log in to your Databricks workspace.
65
+
2. In the sidebar, click Compute.
66
+
3. In the list of available clusters, click the target cluster's name.
67
+
4. On the Configuration tab, expand Advanced options.
68
+
5. Click the JDBC/ODBC tab.
69
+
6. The endpoint is the Server Hostname.
60
70
61
71
## Authentication
62
72
@@ -105,10 +115,30 @@ The `dataset_params` field is used to configure the dataset-specific parameters
105
115
|------------------------|------------|
106
116
| `databricks_cluster_id` | The ID of the compute cluster in Databricks to use for the query. e.g. `1234-567890-abcde123`. |
107
117
118
+
To locate the cluster ID, do the following:
119
+
120
+
1. Log in to your Databricks workspace.
121
+
2. In the sidebar, click Compute.
122
+
3. In the list of available clusters, click the target cluster's name.
123
+
4. On the Configuration tab, expand Advanced options.
124
+
5. Click the JDBC/ODBC tab.
125
+
6. The cluster ID is the prefix of the Server Hostname.
126
+
108
127
### Delta Lake object store parameters
109
128
110
129
Configure the connection to the object store when using `mode: delta_lake`. Use the [secret replacement syntax](../secret-stores/index.md) to reference a secret, e.g. `${secrets:aws_access_key_id}`.
111
130
131
+
### SQL Warehouse parameters
132
+
133
+
- `databricks_sql_warehouse_id`: The ID of the SQL Warehouse in Databricks to use for the query. e.g. `2b4e24cff378fb24`.
134
+
135
+
To locate your SQL Warehouse ID, do the following:
136
+
137
+
1. Log in to your Databricks workspace.
138
+
2. In the sidebar, click SQL -> SQL Warehouses.
139
+
3. In the list of available warehouses, click the target warehouse's name.
140
+
4. Next to the **Name** field, the ID follows the name in parentheses. For example: `My Serverless Warehouse (ID: 2b4e24cff378fb24)`
Copy file name to clipboardExpand all lines: website/docs/components/data-connectors/databricks.md
+15-2Lines changed: 15 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ tags:
9
9
- delta-lake
10
10
---
11
11
12
-
Databricks as a connector for federated SQL query against Databricks using [Spark Connect](https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html) or directly from [Delta Lake](https://delta.io/) tables.
12
+
Databricks as a connector for federated SQL query against Databricks using [Spark Connect](https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html), directly from [Delta Lake](https://delta.io/) tables, or using the [SQL Statement Execution API](https://docs.databricks.com/aws/en/dev-tools/sql-execution-tutorial).
13
13
14
14
```yaml
15
15
datasets:
@@ -62,6 +62,7 @@ Use the [secret replacement syntax](../secret-stores/index.md) to reference a se
| `mode` | The execution mode for querying against Databricks. The default is `spark_connect`. Possible values:<br /> <ul><li>`spark_connect`: Use Spark Connect to query against Databricks. Requires a Spark cluster to be available.</li><li>`delta_lake`: Query directly from Delta Tables. Requires the object store credentials to be provided.</li></ul> |
64
64
| `databricks_endpoint` | The endpoint of the Databricks instance. Required for both modes. |
65
+
| `databricks_sql_warehouse_id` | The ID of the SQL Warehouse in Databricks to use for the query. Only valid when `mode` is `sql_warehouse`. |
65
66
| `databricks_cluster_id` | The ID of the compute cluster in Databricks to use for the query. Only valid when `mode` is `spark_connect`. |
66
67
| `databricks_use_ssl` | If true, use a TLS connection to connect to the Databricks endpoint. Default is `true`. |
67
68
| `client_timeout` | Optional. Applicable only in `delta_lake` mode. Specifies timeout for object store operations. Default value is `30s` E.g. `client_timeout: 60s` |
@@ -157,6 +158,18 @@ Configure the connection to the object store when using `mode: delta_lake`. Use
157
158
databricks_token: ${secrets:my_token}
158
159
```
159
160
161
+
### SQL Warehouse
162
+
163
+
```yaml
164
+
- from: databricks:spiceai.datasets.my_table # A reference to a table in the Databricks unity catalog
@@ -259,4 +272,4 @@ Memory limitations can be mitigated by storing acceleration data on disk, which
259
272
260
273
## Cookbook
261
274
262
-
- A cookbook recipe to configure Databricks as data connector in Spice under `delta_lake` mode. [Spice on Databricks (mode: delta_lake)](https://github.com/spiceai/cookbook/tree/trunk/databricks/delta_lake#readme)
275
+
- A cookbook recipe to configure Databricks as a data connector in Spice. [Spice on Databricks](https://github.com/spiceai/cookbook/tree/trunk/databricks)
0 commit comments