Skip to content

Commit 75130e8

Browse files
committed
Add Databricks M2M and PAT auth docs for data and catalog connectors
1 parent 37cd035 commit 75130e8

2 files changed

Lines changed: 83 additions & 7 deletions

File tree

website/docs/components/catalogs/databricks.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,44 @@ The `params` field is used to configure the connection to the Databricks Unity C
5757
- `databricks_token`: The Databricks API token to authenticate with the Unity Catalog API. Use the [secret replacement syntax](../secret-stores/index.md) to reference a secret, e.g. `${secrets:my_databricks_token}`.
5858
- `databricks_use_ssl`: If true, use a TLS connection to connect to the Databricks endpoint. Default is `true`.
5959

60+
## Authentication
61+
62+
### Personal access token
63+
64+
To Learn more about how to set up personal access tokens, see [Databricks PAT docs](https://docs.databricks.com/aws/en/dev-tools/auth/pat).
65+
66+
```yaml
67+
catalogs:
68+
- from: databricks:my_uc_catalog
69+
name: uc_catalog
70+
include:
71+
- '*.my_table_name'
72+
params:
73+
databricks_endpoint: dbc-a12cd3e4-56f7.cloud.databricks.com
74+
databricks_token: ${secrets:DATABRICKS_TOKEN} # PAT
75+
```
76+
77+
### Databricks service principal
78+
79+
Spice supports the M2M OAuth flow with service principal credentials by utilizing the `databricks_client_id` and `databricks_client_secret` parameters. The runtime will automatically refresh the token.
80+
81+
Ensure that you grant your service principal the "Data Reader" privilege preset for the catalog and "Can Attach" cluster permissions when using Spark Connect mode.
82+
83+
To Learn more about how to set up the service principal, see [Databricks M2M OAuth docs](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m).
84+
85+
```yaml
86+
catalogs:
87+
- from: databricks:my_uc_catalog
88+
name: uc_catalog
89+
include:
90+
- '*.my_table_name'
91+
params:
92+
databricks_endpoint: dbc-a12cd3e4-56f7.cloud.databricks.com
93+
databricks_token: ${secrets:DATABRICKS_TOKEN} # PAT
94+
databricks_client_id: ${secrets:DATABRICKS_CLIENT_ID} # service principal client id
95+
databricks_client_secret: ${secrets:DATABRICKS_CLIENT_SECRET} # service principal client secret
96+
```
97+
6098
## `dataset_params`
6199

62100
The `dataset_params` field is used to configure the dataset-specific parameters for the catalog. The following parameters are supported:

website/docs/components/data-connectors/databricks.md

Lines changed: 45 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -58,13 +58,51 @@ SELECT COUNT(*) FROM cool_dataset;
5858

5959
Use the [secret replacement syntax](../secret-stores/index.md) to reference a secret, e.g. `${secrets:my_token}`.
6060

61-
| Parameter Name | Description |
62-
| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
63-
| `mode` | The execution mode for querying against Databricks. The default is `spark_connect`. Possible values:<br /> <ul><li>`spark_connect`: Use Spark Connect to query against Databricks. Requires a Spark cluster to be available.</li><li>`delta_lake`: Query directly from Delta Tables. Requires the object store credentials to be provided.</li></ul> |
64-
| `databricks_endpoint` | The endpoint of the Databricks instance. Required for both modes. |
65-
| `databricks_cluster_id` | The ID of the compute cluster in Databricks to use for the query. Only valid when `mode` is `spark_connect`. |
66-
| `databricks_use_ssl` | If true, use a TLS connection to connect to the Databricks endpoint. Default is `true`. |
67-
| `client_timeout` | Optional. Applicable only in `delta_lake` mode. Specifies timeout for object store operations. Default value is `30s` E.g. `client_timeout: 60s` |
61+
| Parameter Name | Description |
62+
| -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
63+
| `mode` | The execution mode for querying against Databricks. The default is `spark_connect`. Possible values:<br /> <ul><li>`spark_connect`: Use Spark Connect to query against Databricks. Requires a Spark cluster to be available.</li><li>`delta_lake`: Query directly from Delta Tables. Requires the object store credentials to be provided.</li></ul> |
64+
| `databricks_endpoint` | The endpoint of the Databricks instance. Required for both modes. |
65+
| `databricks_cluster_id` | The ID of the compute cluster in Databricks to use for the query. Only valid when `mode` is `spark_connect`. |
66+
| `databricks_use_ssl` | If true, use a TLS connection to connect to the Databricks endpoint. Default is `true`. |
67+
| `client_timeout` | Optional. Applicable only in `delta_lake` mode. Specifies timeout for object store operations. Default value is `30s` E.g. `client_timeout: 60s` |
68+
| `databricks_token` | The Databricks API token to authenticate with the Unity Catalog API. Can't be used with `databricks_client_id` and `databricks_client_secret`. |
69+
| `databricks_client_id` | The Databricks Service Principal Client ID. Can't be used with `databricks_token`. |
70+
| `databricks_client_secret` | The Databricks Service Principal Client Secret. Can't be used with `databricks_token`. |
71+
72+
## Authentication
73+
74+
### Personal access token
75+
76+
To Learn more about how to set up personal access tokens, see [Databricks PAT docs](https://docs.databricks.com/aws/en/dev-tools/auth/pat).
77+
78+
```yaml
79+
datasets:
80+
- from: databricks:spiceai.datasets.my_awesome_table
81+
name: my_awesome_table
82+
params:
83+
databricks_endpoint: dbc-a1b2345c-d6e7.cloud.databricks.com
84+
databricks_cluster_id: 1234-567890-abcde123
85+
databricks_token: ${secrets:DATABRICKS_TOKEN} # PAT
86+
```
87+
88+
### Databricks service principal
89+
90+
Spice supports the M2M OAuth flow with service principal credentials by utilizing the `databricks_client_id` and `databricks_client_secret` parameters. The runtime will automatically refresh the token.
91+
92+
Ensure that you grant your service principal the "Data Reader" privilege preset for the catalog and "Can Attach" cluster permissions when using Spark Connect mode.
93+
94+
To Learn more about how to set up the service principal, see [Databricks M2M OAuth docs](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m).
95+
96+
```yaml
97+
datasets:
98+
- from: databricks:spiceai.datasets.my_awesome_table
99+
name: my_awesome_table
100+
params:
101+
databricks_endpoint: dbc-a1b2345c-d6e7.cloud.databricks.com
102+
databricks_cluster_id: 1234-567890-abcde123
103+
databricks_client_id: ${secrets:DATABRICKS_CLIENT_ID} # service principal client id
104+
databricks_client_secret: ${secrets:DATABRICKS_CLIENT_SECRET} # service principal client secret
105+
```
68106

69107
## Delta Lake object store parameters
70108

0 commit comments

Comments
 (0)