Skip to content

Commit d3ca431

Browse files
claudespicelukekim
authored andcommitted
docs: SharePoint connector docs missing new auth flows, write support, and sharepoint:// URL scheme
Fixes #1660 Updates the SharePoint connector documentation to reflect the object-store listing connector update: - Documents the four new auth flows beyond client_secret / bearer_token: authorization code, refresh token, device code, and SAML 2.0 bearer (RFC 7522), plus the supporting redirect_uri and scope parameters. - Adds write-side parameters sharepoint_conflict_behavior (default replace) and sharepoint_max_put_bytes (default 1 GiB), with limitations noting only replace is compatible with INSERT/COPY TO. - Documents the new sharepoint:// (double-slash) object-store URL scheme and the five drive forms (me / drives / sites / users / groups), and links the listing-table parameters reference. - Clarifies that sharepoint_client_id and sharepoint_tenant_id are Conditional (not required for bearer_token alone), and lists all six flows in the auth-exclusivity note. - Notes that write workflows additionally require Files.ReadWrite and Sites.ReadWrite.All Microsoft Graph scopes. Verified against spiceai/spiceai trunk (crates/data-connectors/connector-sharepoint/src/lib.rs and crates/data_components/src/sharepoint/{auth,object_store,url}.rs).
1 parent 1a4f3c8 commit d3ca431

1 file changed

Lines changed: 131 additions & 33 deletions

File tree

website/docs/components/data-connectors/sharepoint.md

Lines changed: 131 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: 'SharePoint Data Connector Documentation'
55
pagination_prev: null
66
---
77

8-
The SharePoint Data Connector enables federated SQL queries on documents stored in SharePoint.
8+
The SharePoint Data Connector enables federated SQL queries on documents and tabular data stored in SharePoint or OneDrive.
99

1010
```yaml
1111
datasets:
@@ -45,56 +45,67 @@ Returns
4545
]
4646
````
4747

48-
:::warning[Limitations]
49-
The sharepoint connector does not yet support creating a dataset from a single file (e.g. an Excel spreadsheet). Datasets must be created from a folder of documents.
50-
:::
48+
The SharePoint connector supports two `from:` URL styles:
49+
50+
- **Metadata listing** (`sharepoint:…` — single colon): one row per drive item with optional file content. Best for browsing folders of PDFs, PPTX, DOCX, etc. as document tables.
51+
- **Object-store** (`sharepoint://…` — double slash): tabular access via DataFusion's `ListingTable`. Enables `SELECT`, `INSERT INTO`, `COPY TO`, `COPY FROM`, and `CREATE EXTERNAL TABLE` against CSV, JSON, NDJSON, Parquet, and similar formats stored on SharePoint.
5152

5253
## Configuration
5354

5455
### Parameters
5556

56-
| Name | Required? | Description |
57-
| -------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
58-
| `sharepoint_client_id` | **Yes** | The client ID of the Azure AD (Entra) application |
59-
| `sharepoint_tenant_id` | **Yes** | The tenant ID of the Azure AD (Entra) application. |
60-
| `sharepoint_client_secret` | Optional | For service principal authentication. The client secret of the Azure AD (Entra) application. |
61-
| `sharepoint_bearer_token` | Optional | For user authentication. The bearer access token obtained from the OAuth2 flow (see `spice login sharepoint` [docs](../../cli/reference/login)). |
57+
| Name | Required? | Description |
58+
| ------------------------------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
59+
| `sharepoint_client_id` | Conditional | The client ID of the Azure AD (Entra) application. Required for every flow except `sharepoint_bearer_token`. |
60+
| `sharepoint_tenant_id` | Conditional | The tenant ID of the Azure AD (Entra) application. Required for every flow except `sharepoint_bearer_token`. |
61+
| `sharepoint_client_secret` | Conditional | The client secret of the Azure AD (Entra) application. Required for client-credentials, authorization-code, and refresh-token flows. |
62+
| `sharepoint_bearer_token` | Conditional | A pre-acquired bearer access token. Generally obtained via `spice login sharepoint` (see [docs](../../cli/reference/login)). |
63+
| `sharepoint_auth_code` | Conditional | OAuth2 authorization code (`auth_code` flow). Requires `sharepoint_client_secret` and `sharepoint_redirect_uri`. |
64+
| `sharepoint_refresh_token` | Conditional | OAuth2 refresh token. Requires `sharepoint_client_secret`. |
65+
| `sharepoint_device_code` | Conditional | A pre-acquired OAuth2 device code (`device_code` flow). |
66+
| `sharepoint_saml_assertion` | Conditional | SAML 2.0 bearer assertion ([RFC 7522](https://datatracker.ietf.org/doc/html/rfc7522)) — exchanges a federated IdP assertion for an Azure AD token. |
67+
| `sharepoint_redirect_uri` | Conditional | OAuth2 redirect URI. Required when using `sharepoint_auth_code`. |
68+
| `sharepoint_scope` | Optional | OAuth2 scope. Defaults to `https://graph.microsoft.com/.default`. |
69+
| `sharepoint_conflict_behavior` | Optional | How writes to an existing path are handled. One of `replace` (default; SharePoint stores a new version), `fail` (reject), or `rename` (write under a unique name). Only `replace` is compatible with `INSERT INTO` / `COPY TO`. Applies only to `sharepoint://`. |
70+
| `sharepoint_max_put_bytes` | Optional | Hard cap, in bytes, on a single `put`/multipart upload. Writes above this size are rejected rather than silently buffered. Default: `1073741824` (1 GiB). Applies only to `sharepoint://`. |
6271

6372
:::note
64-
Only one of `sharepoint_client_secret` or `sharepoint_bearer_token` is allowed.
73+
Exactly one of `sharepoint_client_secret` (alone, for client-credentials), `sharepoint_bearer_token`, `sharepoint_auth_code` (with `sharepoint_client_secret` + `sharepoint_redirect_uri`), `sharepoint_refresh_token` (with `sharepoint_client_secret`), `sharepoint_device_code`, or `sharepoint_saml_assertion` must be supplied. Combining unrelated auth credentials is rejected at startup.
6574
:::
6675

76+
When using the `sharepoint://` URL scheme, the standard listing-table parameters (`file_format`, `csv_has_header`, `csv_delimiter`, `json_pointer`, `hive_partitioning_enabled`, etc.) all apply — see [File Formats](./#file-formats) and the [Object Store File Formats](./#object-store-file-formats) reference for the full list.
77+
6778
### `from` formats
6879

69-
The `from` field in a SharePoint dataset takes the following format:
80+
The SharePoint connector accepts two `from:` URL styles.
81+
82+
#### Metadata listing — `sharepoint:` (single colon)
83+
84+
Returns one row per drive item, optionally with the parsed `content` column. Use for document workflows over folders of PDF, PPTX, DOCX, XLSX, etc.
7085

7186
```yaml
7287
from: 'sharepoint:<drive_type>:<drive_id>/<subpath_type>:<subpath_value>'
7388
```
7489
75-
#### Drives
90+
`drive_type` supports the following types:
7691

77-
`drive_type` in a SharePoint Connector `from` field supports the following types:
78-
79-
| Drive Type | Description | Example |
80-
| ---------- | --------------------------- | ----------------------------------------------------- |
81-
| `drive` | The SharePoint drive's name | `from: sharepoint:drive:Documents/...` |
82-
| `driveId` | The SharePoint drive's ID | `from: sharepoint:driveId:b!Mh8opUGD80ec7zGXgX9r/...` |
83-
| `site` | A SharePoint site's name | `from: sharepoint:site:MySite/...` |
84-
| `siteId` | A SharePoint site's ID | `from: sharepoint:siteId:b!Mh8opUGD80ec7zGXgX9r/...` |
85-
| `group` | A SharePoint group's name | `from: sharepoint:group:MyGroup/...` |
86-
| `groupId` | A SharePoint group's ID | `from: sharepoint:groupId:b!Mh8opUGD80ec7zGXgX9r/...` |
92+
| Drive Type | Description | Example |
93+
| ---------- | --------------------------- | ---------------------------------------------------------------- |
94+
| `drive` | The SharePoint drive's name | `from: sharepoint:drive:Documents/...` |
95+
| `driveId` | The SharePoint drive's ID | `from: sharepoint:driveId:b!Mh8opUGD80ec7zGXgX9r/...` |
96+
| `site` | A SharePoint site's name | `from: sharepoint:site:MySite/...` |
97+
| `siteId` | A SharePoint site's ID | `from: sharepoint:siteId:b!Mh8opUGD80ec7zGXgX9r/...` |
98+
| `group` | A SharePoint group's name | `from: sharepoint:group:MyGroup/...` |
99+
| `groupId` | A SharePoint group's ID | `from: sharepoint:groupId:b!Mh8opUGD80ec7zGXgX9r/...` |
87100
| `user` | A user's drive by user ID | `from: sharepoint:user:48d31887-5fad-4d73-a9f5-3c356e68a038/...` |
88-
| `me` | A user's OneDrive | `from: sharepoint:me/...` |
101+
| `me` | A user's OneDrive | `from: sharepoint:me/...` |
89102

90103
:::note
91-
For the `me` drive type the user is identified based on `sharepoint_bearer_token` and cannot be used with `sharepoint_client_secret`
104+
For the `me` drive type the user is identified based on `sharepoint_bearer_token` and cannot be used with `sharepoint_client_secret`.
92105
:::
93106

94107
For a name-based `drive_id`, the connector will attempt to resolve the name to an ID at startup.
95108

96-
#### Subpaths
97-
98109
Within a drive, the SharePoint connector can load documents from:
99110

100111
| Description | Example |
@@ -103,25 +114,61 @@ Within a drive, the SharePoint connector can load documents from:
103114
| A specific path within the drive | `from: sharepoint:drive:Documents/path:/top_secrets` |
104115
| A specific folder ID | `from: sharepoint:group:MyGroup/id:01QM2NJSNHBISUGQ52P5AJQ3CBNOXDMVNT` |
105116

117+
#### Object-store — `sharepoint://` (double slash)
118+
119+
Routes through an `ObjectStore` plus DataFusion's `ListingTable`. Enables `SELECT`, `INSERT INTO`, `COPY TO`, `COPY FROM`, and `CREATE EXTERNAL TABLE` for CSV, JSON, NDJSON, Parquet, and other tabular formats — and binary round-trips for blobs (PDF, etc.) via `(FORMAT binary)`.
120+
121+
| URL form | Description |
122+
| ---------------------------------------------- | --------------------------------- |
123+
| `sharepoint://me/{item-path}` | The authenticated user's OneDrive |
124+
| `sharepoint://drives/{drive-id}/{item-path}` | A specific drive by ID |
125+
| `sharepoint://sites/{site-id}/{item-path}` | A site's default document library |
126+
| `sharepoint://users/{user-id}/{item-path}` | A user's default drive |
127+
| `sharepoint://groups/{group-id}/{item-path}` | A group's default drive |
128+
129+
Path segments are percent-decoded, so site IDs containing `,` (e.g. `contoso.sharepoint.com,abc-def,ghi-jkl`) and file paths containing spaces work without extra escaping beyond standard URL encoding.
130+
131+
`file_format` is auto-inferred from the URL extension when omitted, so `from: sharepoint://me/Documents/Q4.xlsx` resolves without specifying `file_format: xlsx`.
132+
106133
## Authentication
107134

108-
As outlined in the [connector parameters](#parameters), the SharePoint connector supports two types of authentication:
135+
The SharePoint connector supports six authentication flows. Configure exactly one — the connector picks the flow based on which auth parameter is set. See the [Required Microsoft Graph permissions](#required-microsoft-graph-permissions) section below for the API permissions each flow requires.
109136

110-
1. Service principal authentication, by setting the `sharepoint_client_secret` parameter.
111-
2. User authentication, by setting the `sharepoint_bearer_token` parameter. Generally this is obtained by running `spice login sharepoint` and following the OAuth2 flow.
137+
| Flow | Parameters | Notes |
138+
| ------------------------------- | --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- |
139+
| Client credentials | `sharepoint_client_secret` | Service principal / daemon workloads. |
140+
| Bearer token (passthrough) | `sharepoint_bearer_token` | Short-lived broker-minted token. Typically obtained via `spice login sharepoint`. |
141+
| Authorization code | `sharepoint_auth_code` + `sharepoint_client_secret` + `sharepoint_redirect_uri` | Caller has already completed the user-agent redirect and captured the `auth_code`. |
142+
| Refresh token | `sharepoint_refresh_token` + `sharepoint_client_secret` | Renewal from a prior grant. |
143+
| Device code | `sharepoint_device_code` | Caller has already obtained a device code. |
144+
| SAML 2.0 bearer ([RFC 7522](https://datatracker.ietf.org/doc/html/rfc7522)) | `sharepoint_saml_assertion` | Federated IdP (Okta, Ping, ADFS, …) assertion → Azure AD token. |
112145

113146
### Creating an Enterprise Application
114147

115-
To use the SharePoint connector with service principal authentication, you will need to create an Azure AD application and grant it the necessary permissions. This will also support OAuth2 authentication for users within the tenant (i.e. `sharepoint_bearer_token`).
148+
To use the SharePoint connector with service principal authentication, create an Azure AD application and grant it the necessary permissions. This same app registration also supports the OAuth2 user flows above.
116149

117150
1. Create a new Azure AD application in the [Azure portal](https://portal.azure.com/#view/Microsoft_AAD_IAM/ActiveDirectoryMenuBlade/~/Overview).
118-
2. Under the application's `API permissions`, add the following permissions: `Sites.Read.All`, `Files.Read.All`, `User.Read`, `GroupMember.Read.All`
151+
2. Under the application's `API permissions`, add the permissions listed in [Required Microsoft Graph permissions](#required-microsoft-graph-permissions).
119152
- For service principal authentication, Application permissions are required.
120153
- For user authentication, only delegated permissions are required.
121-
3. (For user authentication): Under the applications's `Authentication`, add `http://localhost` as Mobile and desktop applications redirect URI.
154+
3. (For user authentication): Under the application's `Authentication`, add `http://localhost` as a Mobile and desktop applications redirect URI.
122155
4. Add `sharepoint_client_id` (from the `Application (Client) ID` field) and `sharepoint_tenant_id` to the connector configuration.
123156
5. (For service principal authentication): Under the application's `Certificates & secrets`, create a new client secret. Use this for the `sharepoint_client_secret` parameter.
124157

158+
### Required Microsoft Graph permissions
159+
160+
Read-only workflows require:
161+
162+
- `Sites.Read.All`
163+
- `Files.Read.All`
164+
- `User.Read`
165+
- `GroupMember.Read.All`
166+
167+
Write workflows (`INSERT INTO`, `COPY TO`, `CREATE EXTERNAL TABLE` over `sharepoint://`) additionally require:
168+
169+
- `Files.ReadWrite` (for personal drive / specific drive writes), and
170+
- `Sites.ReadWrite.All` (for site-scoped writes).
171+
125172
### Default Spice Application
126173

127174
For your convenience, Spice AI maintains a default Entra (Azure AD) application that can be used for authentication against your SharePoint instance. This application requires OAuth2 authentication. To use it:
@@ -142,6 +189,57 @@ And set the `SPICE_SHAREPOINT_BEARER_TOKEN` secret via:
142189
spice login sharepoint --tenant-id $TENANT_ID --client-id f2b3116e-b4c4-464f-80ec-73cd9d9886b4
143190
```
144191

192+
## Read/write examples (`sharepoint://`)
193+
194+
Reading a CSV from a site library:
195+
196+
```yaml
197+
datasets:
198+
- from: sharepoint://sites/contoso.sharepoint.com,11111111-2222-3333-4444-555555555555,66666666-7777-8888-9999-aaaaaaaaaaaa/Shared%20Documents/reports/sales.csv
199+
name: sales
200+
params:
201+
sharepoint_client_id: ${secrets:SPICE_SHAREPOINT_CLIENT_ID}
202+
sharepoint_tenant_id: ${secrets:SPICE_SHAREPOINT_TENANT_ID}
203+
sharepoint_client_secret: ${secrets:SPICE_SHAREPOINT_CLIENT_SECRET}
204+
file_format: csv
205+
csv_has_header: 'true'
206+
```
207+
208+
Inserting rows:
209+
210+
```sql
211+
INSERT INTO sales VALUES ('Q2', 123456.78);
212+
```
213+
214+
Copying a query result out as Parquet:
215+
216+
```sql
217+
COPY (SELECT * FROM orders WHERE year = 2026)
218+
TO 'sharepoint://me/Documents/exports/orders-2026.parquet'
219+
(FORMAT parquet);
220+
```
221+
222+
Creating an external table over a folder of Parquet files:
223+
224+
```sql
225+
CREATE EXTERNAL TABLE reports
226+
STORED AS PARQUET
227+
LOCATION 'sharepoint://sites/{site-id}/Shared%20Documents/reports/';
228+
```
229+
230+
Round-tripping a binary blob (e.g. a PDF):
231+
232+
```sql
233+
COPY (SELECT content FROM cache WHERE name = 'Q2-report.pdf')
234+
TO 'sharepoint://me/Documents/Q2-report.pdf'
235+
(FORMAT binary);
236+
```
237+
238+
:::warning[Limitations]
239+
- The `sharepoint:` (metadata-listing) syntax cannot create a dataset from a single file (e.g. an Excel spreadsheet) — datasets must be created from a folder of documents. Use the `sharepoint://` object-store syntax for single-file workflows.
240+
- For `INSERT INTO` and `COPY TO`, only `sharepoint_conflict_behavior=replace` is supported. `fail` and `rename` cause writes to be rejected with a clear error.
241+
:::
242+
145243
## Secrets
146244

147245
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](../secret-stores/). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](../secret-stores/#using-secrets).

0 commit comments

Comments
 (0)