You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`upload_concurrency`| Parallel segment uploads during refresh / append commits. |
61
+
|`cayenne_upload_concurrency`| Parallel segment uploads during refresh / append commits. |
62
62
63
63
For S3 Express One Zone, 8–16 parallel uploads typically maximize throughput. For standard S3 across regions, higher concurrency helps hide per-request latency.
Copy file name to clipboardExpand all lines: website/docs/components/data-accelerators/postgres/index.md
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,7 +37,6 @@ The connection to PostgreSQL can be configured by providing the following `param
37
37
- `require`: This mode requires a TLS connection.
38
38
- `prefer`: (default) This mode will try to establish a secure TLS connection if possible, but will connect insecurely if the server does not support TLS.
39
39
- `disable`: This mode will not attempt to use a TLS connection, even if the server supports it.
40
-
- `allow`: This mode will try a non-TLS connection first, then retry with TLS if the server requires it.
41
40
- `pg_sslrootcert`: Optional. Path to a custom PEM certificate file that the connector will trust.
42
41
- `pg_connection_pool_min`: Optional. The minimum number of connections to keep open in the pool, lazily created when requested. Default is `5`.
43
42
- `connection_pool_size`: Optional. The maximum number of connections created in the connection pool. Default is `10`.
Copy file name to clipboardExpand all lines: website/docs/components/data-connectors/databricks/index.md
+11Lines changed: 11 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -91,6 +91,17 @@ The following parameters apply only when `mode` is `sql_warehouse` and control c
91
91
| `statement_max_retries` | Optional. Maximum number of poll retries when waiting for async statement completion. Default: `14`. |
92
92
| `disable_on_permanent_error` | Optional. When `true`, non-retryable errors (401, 403, 404) permanently disable the connector. Default: `true`. |
93
93
94
+
#### Rate control
95
+
96
+
The Databricks connector supports per-dataset rate control parameters when `mode` is `spark_connect` or `sql_warehouse`. These override [`runtime.params`](../../reference/spicepod/runtime#runtimeparams) HTTP rate control defaults. When [`runtime.source_rate_control.state_location`](../../reference/spicepod/runtime#runtimesource_rate_control) is configured, rate limits are coordinated across the cluster.
| `requests_per_second_limit` | Optional. Maximum HTTP requests per second to the Databricks endpoint. Overrides `runtime.params.http_requests_per_second_limit`. |
101
+
| `requests_per_minute_limit` | Optional. Maximum HTTP requests per minute to the Databricks endpoint. Overrides `runtime.params.http_requests_per_minute_limit`. |
102
+
| `rate_control_jitter_min` | Optional. Minimum random delay before HTTP requests when rate control is active. Defaults to `5ms` when a rate limit is configured. Accepts durations like `5ms`. |
103
+
| `rate_control_jitter_max` | Optional. Maximum random delay before HTTP requests when rate control is active. Defaults to `10ms` when a rate limit is configured. Accepts durations like `10ms`. |
Copy file name to clipboardExpand all lines: website/docs/components/data-connectors/elasticsearch/deployment.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,7 +57,7 @@ Long-running search responses (very large `LIMIT`, deep pagination, or expensive
57
57
## Capacity & Sizing
58
58
59
59
-**Throughput**: Bounded by the Elasticsearch cluster's request handling and (for kNN) HNSW search cost. Plan refresh intervals and concurrent query load to stay within the cluster's tested capacity.
60
-
-**Result size**: The connector pages through results using point-in-time (PIT) queries with `search_after`, fetching up to 10,000 hits per batch (bounded by the Elasticsearch `index.max_result_window` setting). Queries with `LIMIT` fetch only the requested number of rows. Queries without `LIMIT` page through the entire index.
60
+
-**Result size**: The connector issues a single `_search` request per query, returning at most 10,000 hits (bounded by the Elasticsearch `index.max_result_window` setting). Queries with `LIMIT N` fetch `min(N, 10000)`rows. For result sets larger than 10,000, accelerate the dataset.
61
61
-**Mapping fetches**: At dataset registration the connector fetches the index mapping once via `GET /<index>/_mapping`. Mapping changes after registration are not picked up until the runtime restarts.
Copy file name to clipboardExpand all lines: website/docs/components/data-connectors/elasticsearch/index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -138,7 +138,7 @@ TLS is enabled automatically for `https://` endpoints.
138
138
- Nested object fields are exposed as JSON strings rather than structured columns.
139
139
- `date`and `date_nanos` fields are preserved as strings because Elasticsearch accepts heterogeneous date formats; cast to a timestamp in SQL when numeric comparison is required.
140
140
- `dense_vector`fields without a declared `dims` value fall back to `Utf8` and are not usable as a vector column.
141
-
- The connector pages through results in batches of up to 10,000 hits using point-in-time (PIT) queries with `search_after`. Queries with `LIMIT` fetch only the requested number of rows; queries without `LIMIT` page through the entire index. Individual batch size is bounded by the Elasticsearch `index.max_result_window` setting (default 10,000).
141
+
- The connector issues a single `_search` request per query. The result set is capped at 10,000 hits (the Elasticsearch `index.max_result_window` default). Queries with `LIMIT N` fetch `min(N, 10000)` rows; queries without `LIMIT` return at most 10,000 rows. For larger result sets, accelerate the dataset.
142
142
- Pushdown of SQL predicates to Elasticsearch query DSL is limited; complex filter expressions are evaluated locally by DataFusion after fetching results.
143
143
144
144
Elasticsearch can also be configured as a [Vector Engine](../vectors/elasticsearch) for datasets sourced from other connectors (storing Spice-managed embeddings in Elasticsearch rather than querying an existing index).
Copy file name to clipboardExpand all lines: website/docs/components/data-connectors/github/index.md
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -77,15 +77,19 @@ With GitHub App Installation authentication, the connector's functionality depen
77
77
78
78
### Rate Limiting
79
79
80
-
When using multiple GitHub datasets sharing the same GitHub token or GitHub app credentials, it is possible to exceed GitHub's primary and secondary rate limits. To mitigate this, use the `github_max_concurrent_connections` runtime parameter. This connections limit applies per GitHub token and per GitHub app installation, following GitHub's rate limit policy.
80
+
When using multiple GitHub datasets sharing the same GitHub token or GitHub app credentials, it is possible to exceed GitHub's primary and secondary rate limits. To mitigate this, use the `github_concurrent_connections_limit` setting under [`runtime.source_rate_control`](../../reference/spicepod/runtime#runtimesource_rate_control). This connections limit applies per GitHub token and per GitHub app installation, following GitHub's rate limit policy.
81
+
82
+
:::warning[Deprecated]
83
+
`runtime.params.github_max_concurrent_connections` is deprecated. Use `runtime.source_rate_control.github_concurrent_connections_limit` instead.
84
+
:::
81
85
82
86
Example Configuration:
83
87
84
88
```yaml
85
89
# ... other configuration ...
86
90
runtime:
87
-
params:
88
-
github_max_concurrent_connections: 5# Defaults to 10
91
+
source_rate_control:
92
+
github_concurrent_connections_limit: 5# Defaults to 10
Copy file name to clipboardExpand all lines: website/docs/components/data-connectors/graphql/index.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,15 +44,15 @@ The dataset name. This will be used as the table name within Spice. The dataset
44
44
45
45
The GraphQL data connector can be configured by providing the following `params`. Use the [secret replacement syntax](../secret-stores) to load the password from a secret store, e.g. `${secrets:my_graphql_auth_token}`.
| `unnest_depth` | Depth level to automatically unnest objects to. By default, disabled if unspecified or `0`. |
50
-
| `graphql_auth_header` | A custom header name to use for authentication instead of the default `Authorization: Bearer` header. When set, the value of `graphql_auth_token` is sent as the value of this header. Useful for APIs that require authentication via a custom header (e.g. `X-Shopify-Access-Token`). |
51
-
| `graphql_auth_token` | The authentication token to use to connect to the GraphQL server. Uses bearer authentication by default, or sent via the custom header specified by `graphql_auth_header`. |
52
-
| `graphql_auth_user` | The username to use for basic auth. E.g. `graphql_auth_user: my_user` |
53
-
| `graphql_auth_pass` | The password to use for basic auth. E.g. `graphql_auth_pass: ${secrets:my_graphql_auth_pass}` |
54
-
| `graphql_query` | The GraphQL query to execute. See [examples](#examples) for a sample GraphQL query |
55
-
| `json_pointer` | The [JSON pointer](https://datatracker.ietf.org/doc/html/rfc6901) into the response body. When `graphql_query` is [paginated](#pagination), the `json_pointer` can be inferred. |
47
+
| Parameter Name | Description | Required | Default |
| `graphql_query` | The GraphQL query to execute. See [examples](#examples) for a sample GraphQL query. | Yes | - |
50
+
| `json_pointer` | The [JSON pointer](https://datatracker.ietf.org/doc/html/rfc6901) into the response body. When `graphql_query` is [paginated](#pagination), the `json_pointer` can be inferred. | No | - |
51
+
| `unnest_depth` | Depth level to automatically unnest objects to. Disabled if unspecified or `0`. Maximum value is `50`. | No| `0` |
52
+
| `graphql_auth_header` | A custom header name to use for authentication instead of the default `Authorization: Bearer` header. When set, the value of `graphql_auth_token` is sent as the value of this header. Useful for APIs that require authentication via a custom header (e.g. `X-Shopify-Access-Token`). | No | - |
53
+
| `graphql_auth_token` | The authentication token to use to connect to the GraphQL server. Uses bearer authentication by default, or sent via the custom header specified by `graphql_auth_header`. | No | - |
54
+
| `graphql_auth_user` | The username to use for basic auth. E.g. `graphql_auth_user: my_user` | No | - |
55
+
| `graphql_auth_pass` | The password to use for basic auth. E.g. `graphql_auth_pass: ${secrets:my_graphql_auth_pass}` | No | - |
0 commit comments