Skip to content

Commit 2d2014a

Browse files
authored
Merge branch 'trunk' into claude/review-outdated-docs-UOUyr
2 parents 7f03845 + 9c79a1e commit 2d2014a

38 files changed

Lines changed: 97 additions & 46 deletions

File tree

website/docs/components/data-accelerators/cayenne/deployment.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ For point-lookup-heavy workloads, size `cayenne_segment_cache_mb` generously —
5858

5959
| Parameter | Description |
6060
| --------------------- | --------------------------------------------------------- |
61-
| `upload_concurrency` | Parallel segment uploads during refresh / append commits. |
61+
| `cayenne_upload_concurrency` | Parallel segment uploads during refresh / append commits. |
6262

6363
For S3 Express One Zone, 8–16 parallel uploads typically maximize throughput. For standard S3 across regions, higher concurrency helps hide per-request latency.
6464

website/docs/components/data-accelerators/postgres/index.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,6 @@ The connection to PostgreSQL can be configured by providing the following `param
3737
- `require`: This mode requires a TLS connection.
3838
- `prefer`: (default) This mode will try to establish a secure TLS connection if possible, but will connect insecurely if the server does not support TLS.
3939
- `disable`: This mode will not attempt to use a TLS connection, even if the server supports it.
40-
- `allow`: This mode will try a non-TLS connection first, then retry with TLS if the server requires it.
4140
- `pg_sslrootcert`: Optional. Path to a custom PEM certificate file that the connector will trust.
4241
- `pg_connection_pool_min`: Optional. The minimum number of connections to keep open in the pool, lazily created when requested. Default is `5`.
4342
- `connection_pool_size`: Optional. The maximum number of connections created in the connection pool. Default is `10`.

website/docs/components/data-connectors/adbc.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,7 @@ The dataset name cannot be a [reserved keyword](../../reference/spicepod/keyword
112112
| `adbc_schema` | Optional. Sets the default schema for the connection. |
113113
| `connection_pool_size` | Optional. Maximum number of connections in the connection pool. Default: `5`. |
114114
| `connection_pool_min_idle` | Optional. Minimum number of idle connections in the pool. Default: `1`. |
115+
| `query_federation` | Optional. Controls whether queries are federated to the ADBC source. Values: `enabled`, `disabled`. Default: `enabled`. |
115116

116117
:::warning[In-memory databases]
117118
In-memory database URIs (e.g., `:memory:` or URIs containing `mode=memory`) are not supported.

website/docs/components/data-connectors/clickhouse.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,9 @@ The table below shows the ClickHouse data types supported, along with the type m
8989
| `FixedString` | `Utf8` |
9090
| `UUID` | `Utf8` |
9191
| `Date` | `Date32` |
92+
| `Date32` | `Date32` |
9293
| `DateTime` | `Timestamp(Second, None)` |
94+
| `DateTime64` | `Timestamp(Second, None)` |
9395
| `Nullable(T)` | Mapped inner type `T` |
9496

9597
## Examples

website/docs/components/data-connectors/databricks/index.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,17 @@ The following parameters apply only when `mode` is `sql_warehouse` and control c
9191
| `statement_max_retries` | Optional. Maximum number of poll retries when waiting for async statement completion. Default: `14`. |
9292
| `disable_on_permanent_error` | Optional. When `true`, non-retryable errors (401, 403, 404) permanently disable the connector. Default: `true`. |
9393

94+
#### Rate control
95+
96+
The Databricks connector supports per-dataset rate control parameters when `mode` is `spark_connect` or `sql_warehouse`. These override [`runtime.params`](../../reference/spicepod/runtime#runtimeparams) HTTP rate control defaults. When [`runtime.source_rate_control.state_location`](../../reference/spicepod/runtime#runtimesource_rate_control) is configured, rate limits are coordinated across the cluster.
97+
98+
| Parameter Name | Description |
99+
| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
100+
| `requests_per_second_limit` | Optional. Maximum HTTP requests per second to the Databricks endpoint. Overrides `runtime.params.http_requests_per_second_limit`. |
101+
| `requests_per_minute_limit` | Optional. Maximum HTTP requests per minute to the Databricks endpoint. Overrides `runtime.params.http_requests_per_minute_limit`. |
102+
| `rate_control_jitter_min` | Optional. Minimum random delay before HTTP requests when rate control is active. Defaults to `5ms` when a rate limit is configured. Accepts durations like `5ms`. |
103+
| `rate_control_jitter_max` | Optional. Maximum random delay before HTTP requests when rate control is active. Defaults to `10ms` when a rate limit is configured. Accepts durations like `10ms`. |
104+
94105
## Authentication
95106

96107
### Personal access token

website/docs/components/data-connectors/delta-lake/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,7 @@ The table below shows the Delta Lake data types supported, along with the type m
181181
| `Decimal` | `Decimal128` |
182182
| `Array` | `List` |
183183
| `Struct` | `Struct` |
184+
| `Variant` | `Struct` |
184185
| `Map` | `Map` |
185186

186187
## Limitations

website/docs/components/data-connectors/elasticsearch/deployment.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ Long-running search responses (very large `LIMIT`, deep pagination, or expensive
5757
## Capacity & Sizing
5858

5959
- **Throughput**: Bounded by the Elasticsearch cluster's request handling and (for kNN) HNSW search cost. Plan refresh intervals and concurrent query load to stay within the cluster's tested capacity.
60-
- **Result size**: The connector pages through results using point-in-time (PIT) queries with `search_after`, fetching up to 10,000 hits per batch (bounded by the Elasticsearch `index.max_result_window` setting). Queries with `LIMIT` fetch only the requested number of rows. Queries without `LIMIT` page through the entire index.
60+
- **Result size**: The connector issues a single `_search` request per query, returning at most 10,000 hits (bounded by the Elasticsearch `index.max_result_window` setting). Queries with `LIMIT N` fetch `min(N, 10000)` rows. For result sets larger than 10,000, accelerate the dataset.
6161
- **Mapping fetches**: At dataset registration the connector fetches the index mapping once via `GET /<index>/_mapping`. Mapping changes after registration are not picked up until the runtime restarts.
6262

6363
## Search Routing

website/docs/components/data-connectors/elasticsearch/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ TLS is enabled automatically for `https://` endpoints.
138138
- Nested object fields are exposed as JSON strings rather than structured columns.
139139
- `date` and `date_nanos` fields are preserved as strings because Elasticsearch accepts heterogeneous date formats; cast to a timestamp in SQL when numeric comparison is required.
140140
- `dense_vector` fields without a declared `dims` value fall back to `Utf8` and are not usable as a vector column.
141-
- The connector pages through results in batches of up to 10,000 hits using point-in-time (PIT) queries with `search_after`. Queries with `LIMIT` fetch only the requested number of rows; queries without `LIMIT` page through the entire index. Individual batch size is bounded by the Elasticsearch `index.max_result_window` setting (default 10,000).
141+
- The connector issues a single `_search` request per query. The result set is capped at 10,000 hits (the Elasticsearch `index.max_result_window` default). Queries with `LIMIT N` fetch `min(N, 10000)` rows; queries without `LIMIT` return at most 10,000 rows. For larger result sets, accelerate the dataset.
142142
- Pushdown of SQL predicates to Elasticsearch query DSL is limited; complex filter expressions are evaluated locally by DataFusion after fetching results.
143143

144144
Elasticsearch can also be configured as a [Vector Engine](../vectors/elasticsearch) for datasets sourced from other connectors (storing Spice-managed embeddings in Elasticsearch rather than querying an existing index).

website/docs/components/data-connectors/github/index.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -77,15 +77,19 @@ With GitHub App Installation authentication, the connector's functionality depen
7777

7878
### Rate Limiting
7979

80-
When using multiple GitHub datasets sharing the same GitHub token or GitHub app credentials, it is possible to exceed GitHub's primary and secondary rate limits. To mitigate this, use the `github_max_concurrent_connections` runtime parameter. This connections limit applies per GitHub token and per GitHub app installation, following GitHub's rate limit policy.
80+
When using multiple GitHub datasets sharing the same GitHub token or GitHub app credentials, it is possible to exceed GitHub's primary and secondary rate limits. To mitigate this, use the `github_concurrent_connections_limit` setting under [`runtime.source_rate_control`](../../reference/spicepod/runtime#runtimesource_rate_control). This connections limit applies per GitHub token and per GitHub app installation, following GitHub's rate limit policy.
81+
82+
:::warning[Deprecated]
83+
`runtime.params.github_max_concurrent_connections` is deprecated. Use `runtime.source_rate_control.github_concurrent_connections_limit` instead.
84+
:::
8185

8286
Example Configuration:
8387

8488
```yaml
8589
# ... other configuration ...
8690
runtime:
87-
params:
88-
github_max_concurrent_connections: 5 # Defaults to 10
91+
source_rate_control:
92+
github_concurrent_connections_limit: 5 # Defaults to 10
8993

9094
datasets:
9195
- from: github:github.com/spiceai/spiceai/files/v0.17.2-beta

website/docs/components/data-connectors/graphql/index.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -44,15 +44,15 @@ The dataset name. This will be used as the table name within Spice. The dataset
4444

4545
The GraphQL data connector can be configured by providing the following `params`. Use the [secret replacement syntax](../secret-stores) to load the password from a secret store, e.g. `${secrets:my_graphql_auth_token}`.
4646

47-
| Parameter Name | Description |
48-
| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
49-
| `unnest_depth` | Depth level to automatically unnest objects to. By default, disabled if unspecified or `0`. |
50-
| `graphql_auth_header` | A custom header name to use for authentication instead of the default `Authorization: Bearer` header. When set, the value of `graphql_auth_token` is sent as the value of this header. Useful for APIs that require authentication via a custom header (e.g. `X-Shopify-Access-Token`). |
51-
| `graphql_auth_token` | The authentication token to use to connect to the GraphQL server. Uses bearer authentication by default, or sent via the custom header specified by `graphql_auth_header`. |
52-
| `graphql_auth_user` | The username to use for basic auth. E.g. `graphql_auth_user: my_user` |
53-
| `graphql_auth_pass` | The password to use for basic auth. E.g. `graphql_auth_pass: ${secrets:my_graphql_auth_pass}` |
54-
| `graphql_query` | The GraphQL query to execute. See [examples](#examples) for a sample GraphQL query |
55-
| `json_pointer` | The [JSON pointer](https://datatracker.ietf.org/doc/html/rfc6901) into the response body. When `graphql_query` is [paginated](#pagination), the `json_pointer` can be inferred. |
47+
| Parameter Name | Description | Required | Default |
48+
| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------- |
49+
| `graphql_query` | The GraphQL query to execute. See [examples](#examples) for a sample GraphQL query. | Yes | - |
50+
| `json_pointer` | The [JSON pointer](https://datatracker.ietf.org/doc/html/rfc6901) into the response body. When `graphql_query` is [paginated](#pagination), the `json_pointer` can be inferred. | No | - |
51+
| `unnest_depth` | Depth level to automatically unnest objects to. Disabled if unspecified or `0`. Maximum value is `50`. | No | `0` |
52+
| `graphql_auth_header` | A custom header name to use for authentication instead of the default `Authorization: Bearer` header. When set, the value of `graphql_auth_token` is sent as the value of this header. Useful for APIs that require authentication via a custom header (e.g. `X-Shopify-Access-Token`). | No | - |
53+
| `graphql_auth_token` | The authentication token to use to connect to the GraphQL server. Uses bearer authentication by default, or sent via the custom header specified by `graphql_auth_header`. | No | - |
54+
| `graphql_auth_user` | The username to use for basic auth. E.g. `graphql_auth_user: my_user` | No | - |
55+
| `graphql_auth_pass` | The password to use for basic auth. E.g. `graphql_auth_pass: ${secrets:my_graphql_auth_pass}` | No | - |
5656

5757
#### GraphQL Query Example
5858

0 commit comments

Comments
 (0)