You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: SharePoint connector docs missing new auth flows, write support, and sharepoint:// URL scheme
Fixes#1660
Updates the SharePoint connector documentation to reflect the
object-store listing connector update:
- Documents the four new auth flows beyond client_secret /
bearer_token: authorization code, refresh token, device code, and
SAML 2.0 bearer (RFC 7522), plus the supporting redirect_uri and
scope parameters.
- Adds write-side parameters sharepoint_conflict_behavior (default
replace) and sharepoint_max_put_bytes (default 1 GiB), with
limitations noting only replace is compatible with INSERT/COPY TO.
- Documents the new sharepoint:// (double-slash) object-store URL
scheme and the five drive forms (me / drives / sites / users /
groups), and links the listing-table parameters reference.
- Clarifies that sharepoint_client_id and sharepoint_tenant_id are
Conditional (not required for bearer_token alone), and lists all
six flows in the auth-exclusivity note.
- Notes that write workflows additionally require Files.ReadWrite and
Sites.ReadWrite.All Microsoft Graph scopes.
Verified against spiceai/spiceai trunk
(crates/data-connectors/connector-sharepoint/src/lib.rs and
crates/data_components/src/sharepoint/{auth,object_store,url}.rs).
@@ -5,7 +5,7 @@ description: 'SharePoint Data Connector Documentation'
5
5
pagination_prev: null
6
6
---
7
7
8
-
The SharePoint Data Connector enables federated SQL queries on documents stored in SharePoint.
8
+
The SharePoint Data Connector enables federated SQL queries on documents and tabular data stored in SharePoint or OneDrive.
9
9
10
10
```yaml
11
11
datasets:
@@ -45,56 +45,67 @@ Returns
45
45
]
46
46
````
47
47
48
-
:::warning[Limitations]
49
-
The sharepoint connector does not yet support creating a dataset from a single file (e.g. an Excel spreadsheet). Datasets must be created from a folder of documents.
50
-
:::
48
+
The SharePoint connector supports two `from:` URL styles:
49
+
50
+
-**Metadata listing** (`sharepoint:…` — single colon): one row per drive item with optional file content. Best for browsing folders of PDFs, PPTX, DOCX, etc. as document tables.
51
+
-**Object-store** (`sharepoint://…` — double slash): tabular access via DataFusion's `ListingTable`. Enables `SELECT`, `INSERT INTO`, `COPY TO`, `COPY FROM`, and `CREATE EXTERNAL TABLE` against CSV, JSON, NDJSON, Parquet, and similar formats stored on SharePoint.
|`sharepoint_client_id`|**Yes**| The client ID of the Azure AD (Entra) application |
59
-
|`sharepoint_tenant_id`|**Yes**| The tenant ID of the Azure AD (Entra) application. |
60
-
|`sharepoint_client_secret`| Optional | For service principal authentication. The client secret of the Azure AD (Entra) application. |
61
-
|`sharepoint_bearer_token`| Optional | For user authentication. The bearer access token obtained from the OAuth2 flow (see `spice login sharepoint`[docs](../../cli/reference/login)). |
|`sharepoint_client_id`| Conditional | The client ID of the Azure AD (Entra) application. Required for every flow except `sharepoint_bearer_token`. |
60
+
|`sharepoint_tenant_id`| Conditional | The tenant ID of the Azure AD (Entra) application. Required for every flow except `sharepoint_bearer_token`. |
61
+
|`sharepoint_client_secret`| Conditional | The client secret of the Azure AD (Entra) application. Required for client-credentials, authorization-code, and refresh-token flows. |
62
+
|`sharepoint_bearer_token`| Conditional | A pre-acquired bearer access token. Generally obtained via `spice login sharepoint` (see [docs](../../cli/reference/login)). |
|`sharepoint_saml_assertion`| Conditional | SAML 2.0 bearer assertion ([RFC 7522](https://datatracker.ietf.org/doc/html/rfc7522)) — exchanges a federated IdP assertion for an Azure AD token. |
67
+
|`sharepoint_redirect_uri`| Conditional | OAuth2 redirect URI. Required when using `sharepoint_auth_code`. |
68
+
|`sharepoint_scope`| Optional | OAuth2 scope. Defaults to `https://graph.microsoft.com/.default`. |
69
+
|`sharepoint_conflict_behavior`| Optional | How writes to an existing path are handled. One of `replace` (default; SharePoint stores a new version), `fail` (reject), or `rename` (write under a unique name). Only `replace` is compatible with `INSERT INTO` / `COPY TO`. Applies only to `sharepoint://`. |
70
+
|`sharepoint_max_put_bytes`| Optional | Hard cap, in bytes, on a single `put`/multipart upload. Writes above this size are rejected rather than silently buffered. Default: `1073741824` (1 GiB). Applies only to `sharepoint://`. |
62
71
63
72
:::note
64
-
Only one of `sharepoint_client_secret`or `sharepoint_bearer_token`is allowed.
73
+
Exactly one of `sharepoint_client_secret`(alone, for client-credentials), `sharepoint_bearer_token`, `sharepoint_auth_code` (with `sharepoint_client_secret` + `sharepoint_redirect_uri`), `sharepoint_refresh_token` (with `sharepoint_client_secret`), `sharepoint_device_code`, or `sharepoint_saml_assertion` must be supplied. Combining unrelated auth credentials is rejected at startup.
65
74
:::
66
75
76
+
When using the `sharepoint://` URL scheme, the standard listing-table parameters (`file_format`, `csv_has_header`, `csv_delimiter`, `json_pointer`, `hive_partitioning_enabled`, etc.) all apply — see [File Formats](./#file-formats) and the [Object Store File Formats](./#object-store-file-formats) reference for the full list.
77
+
67
78
### `from` formats
68
79
69
-
The `from` field in a SharePoint dataset takes the following format:
80
+
The SharePoint connector accepts two `from:` URL styles.
Routes through an `ObjectStore` plus DataFusion's `ListingTable`. Enables `SELECT`, `INSERT INTO`, `COPY TO`, `COPY FROM`, and `CREATE EXTERNAL TABLE` for CSV, JSON, NDJSON, Parquet, and other tabular formats — and binary round-trips for blobs (PDF, etc.) via `(FORMAT binary)`.
| `sharepoint://me/{item-path}` | The authenticated user's OneDrive |
124
+
| `sharepoint://drives/{drive-id}/{item-path}` | A specific drive by ID |
125
+
| `sharepoint://sites/{site-id}/{item-path}` | A site's default document library |
126
+
| `sharepoint://users/{user-id}/{item-path}` | A user's default drive |
127
+
| `sharepoint://groups/{group-id}/{item-path}` | A group's default drive |
128
+
129
+
Path segments are percent-decoded, so site IDs containing `,` (e.g. `contoso.sharepoint.com,abc-def,ghi-jkl`) and file paths containing spaces work without extra escaping beyond standard URL encoding.
130
+
131
+
`file_format` is auto-inferred from the URL extension when omitted, so `from: sharepoint://me/Documents/Q4.xlsx` resolves without specifying `file_format: xlsx`.
132
+
106
133
## Authentication
107
134
108
-
As outlined in the [connector parameters](#parameters), the SharePoint connector supports two types of authentication:
135
+
The SharePoint connector supports six authentication flows. Configure exactly one — the connector picks the flow based on which auth parameter is set. See the [Required Microsoft Graph permissions](#required-microsoft-graph-permissions) section below for the API permissions each flow requires.
109
136
110
-
1. Service principal authentication, by setting the `sharepoint_client_secret` parameter.
111
-
2. User authentication, by setting the `sharepoint_bearer_token` parameter. Generally this is obtained by running `spice login sharepoint` and following the OAuth2 flow.
To use the SharePoint connector with service principal authentication, you will need to create an Azure AD application and grant it the necessary permissions. This will also support OAuth2 authentication for users within the tenant (i.e. `sharepoint_bearer_token`).
148
+
To use the SharePoint connector with service principal authentication, create an Azure AD application and grant it the necessary permissions. This same app registration also supports the OAuth2 user flows above.
116
149
117
150
1. Create a new Azure AD application in the [Azure portal](https://portal.azure.com/#view/Microsoft_AAD_IAM/ActiveDirectoryMenuBlade/~/Overview).
118
-
2. Under the application's `API permissions`, add the following permissions: `Sites.Read.All`, `Files.Read.All`, `User.Read`, `GroupMember.Read.All`
151
+
2. Under the application's `API permissions`, add the permissions listed in [Required Microsoft Graph permissions](#required-microsoft-graph-permissions).
119
152
- For service principal authentication, Application permissions are required.
120
153
- For user authentication, only delegated permissions are required.
121
-
3. (For user authentication): Under the applications's `Authentication`, add `http://localhost` as Mobile and desktop applications redirect URI.
154
+
3. (For user authentication): Under the application's `Authentication`, add `http://localhost` as a Mobile and desktop applications redirect URI.
122
155
4. Add `sharepoint_client_id` (from the `Application (Client) ID` field) and `sharepoint_tenant_id` to the connector configuration.
123
156
5. (For service principal authentication): Under the application's `Certificates & secrets`, create a new client secret. Use this for the `sharepoint_client_secret` parameter.
- `Files.ReadWrite`(for personal drive / specific drive writes), and
170
+
- `Sites.ReadWrite.All`(for site-scoped writes).
171
+
125
172
### Default Spice Application
126
173
127
174
For your convenience, Spice AI maintains a default Entra (Azure AD) application that can be used for authentication against your SharePoint instance. This application requires OAuth2 authentication. To use it:
@@ -142,6 +189,57 @@ And set the `SPICE_SHAREPOINT_BEARER_TOKEN` secret via:
COPY (SELECT content FROM cache WHERE name = 'Q2-report.pdf')
234
+
TO 'sharepoint://me/Documents/Q2-report.pdf'
235
+
(FORMAT binary);
236
+
```
237
+
238
+
:::warning[Limitations]
239
+
- The `sharepoint:` (metadata-listing) syntax cannot create a dataset from a single file (e.g. an Excel spreadsheet) — datasets must be created from a folder of documents. Use the `sharepoint://` object-store syntax for single-file workflows.
240
+
- For `INSERT INTO` and `COPY TO`, only `sharepoint_conflict_behavior=replace` is supported. `fail` and `rename` cause writes to be rejected with a clear error.
241
+
:::
242
+
145
243
## Secrets
146
244
147
245
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](../secret-stores/). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](../secret-stores/#using-secrets).
0 commit comments