You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: Add runtime.source_rate_control and Databricks rate control params (#1704)
* docs: Document runtime.source_rate_control and connector rate control params
Add documentation for the new runtime.source_rate_control configuration
section that enables cluster-wide rate control state persistence through
object storage. Update GitHub connector docs to reference the new
github_concurrent_connections_limit setting (deprecating the old
runtime.params.github_max_concurrent_connections). Add per-dataset rate
control parameters to Databricks connector docs.
* fix: Use relative links for Docusaurus cross-page references
* fix: Correct relative link depth for index.md pages in Docusaurus
---------
Co-authored-by: claudespice <claudespice@users.noreply.github.com>
Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Copy file name to clipboardExpand all lines: website/docs/components/data-connectors/databricks/index.md
+11Lines changed: 11 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -91,6 +91,17 @@ The following parameters apply only when `mode` is `sql_warehouse` and control c
91
91
| `statement_max_retries` | Optional. Maximum number of poll retries when waiting for async statement completion. Default: `14`. |
92
92
| `disable_on_permanent_error` | Optional. When `true`, non-retryable errors (401, 403, 404) permanently disable the connector. Default: `true`. |
93
93
94
+
#### Rate control
95
+
96
+
The Databricks connector supports per-dataset rate control parameters when `mode` is `spark_connect` or `sql_warehouse`. These override [`runtime.params`](../../reference/spicepod/runtime#runtimeparams) HTTP rate control defaults. When [`runtime.source_rate_control.state_location`](../../reference/spicepod/runtime#runtimesource_rate_control) is configured, rate limits are coordinated across the cluster.
| `requests_per_second_limit` | Optional. Maximum HTTP requests per second to the Databricks endpoint. Overrides `runtime.params.http_requests_per_second_limit`. |
101
+
| `requests_per_minute_limit` | Optional. Maximum HTTP requests per minute to the Databricks endpoint. Overrides `runtime.params.http_requests_per_minute_limit`. |
102
+
| `rate_control_jitter_min` | Optional. Minimum random delay before HTTP requests when rate control is active. Defaults to `5ms` when a rate limit is configured. Accepts durations like `5ms`. |
103
+
| `rate_control_jitter_max` | Optional. Maximum random delay before HTTP requests when rate control is active. Defaults to `10ms` when a rate limit is configured. Accepts durations like `10ms`. |
Copy file name to clipboardExpand all lines: website/docs/components/data-connectors/github/index.md
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -77,15 +77,19 @@ With GitHub App Installation authentication, the connector's functionality depen
77
77
78
78
### Rate Limiting
79
79
80
-
When using multiple GitHub datasets sharing the same GitHub token or GitHub app credentials, it is possible to exceed GitHub's primary and secondary rate limits. To mitigate this, use the `github_max_concurrent_connections` runtime parameter. This connections limit applies per GitHub token and per GitHub app installation, following GitHub's rate limit policy.
80
+
When using multiple GitHub datasets sharing the same GitHub token or GitHub app credentials, it is possible to exceed GitHub's primary and secondary rate limits. To mitigate this, use the `github_concurrent_connections_limit` setting under [`runtime.source_rate_control`](../../reference/spicepod/runtime#runtimesource_rate_control). This connections limit applies per GitHub token and per GitHub app installation, following GitHub's rate limit policy.
81
+
82
+
:::warning[Deprecated]
83
+
`runtime.params.github_max_concurrent_connections` is deprecated. Use `runtime.source_rate_control.github_concurrent_connections_limit` instead.
84
+
:::
81
85
82
86
Example Configuration:
83
87
84
88
```yaml
85
89
# ... other configuration ...
86
90
runtime:
87
-
params:
88
-
github_max_concurrent_connections: 5# Defaults to 10
91
+
source_rate_control:
92
+
github_concurrent_connections_limit: 5# Defaults to 10
Copy file name to clipboardExpand all lines: website/docs/reference/spicepod/runtime.md
+26-15Lines changed: 26 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -149,28 +149,39 @@ runtime:
149
149
http_requests_per_minute_limit: 200
150
150
```
151
151
152
-
### CDC Pipeline Tuning
152
+
## `runtime.source_rate_control`
153
153
154
-
Datasets using `refresh_mode: changes` (CDC) support the following pipeline tunables. These control how change envelopes from CDC sources (e.g. PostgreSQL logical replication, Kafka, DynamoDB Streams) are buffered, coalesced, and committed.
154
+
Optional. Configures how Spice limits outbound requests to upstream data sources, and optionally enables cluster-wide coordination through persisted state in object storage.
155
155
156
-
| Parameter Name | Default | Range | Description |
| `cdc_prefetch_buffer` | `32` | `1`–`1024` | Channel depth between the CDC source-stream reader and the apply loop. Each slot holds one decoded change envelope. |
159
-
| `cdc_max_coalesced_envelopes` | `64` | `1`–`4096` | Maximum number of change envelopes coalesced into a single accelerator write. Coalescing amortizes per-envelope planning cost. |
160
-
| `cdc_max_coalesced_bytes` | `67108864` | `1`–`1073741824` | Byte budget for a coalesced burst (default 64 MiB, max 1 GiB). A single envelope may exceed this; otherwise the next envelope starts a new burst. |
161
-
| `cdc_commit_timeout_ms` | `30000` | `1`–`3600000` | Maximum time in milliseconds to wait for a source-side commit before logging a stall warning (default 30s, max 1hr). |
162
-
163
-
Out-of-range or unparseable values fall back to defaults with a warning. Environment variables (`SPICE_CDC_PREFETCH_BUFFER`, `SPICE_CDC_MAX_COALESCED_ENVELOPES`, `SPICE_CDC_MAX_COALESCED_BYTES`, `SPICE_CDC_COMMIT_TIMEOUT_MS`) are used as fallback when the corresponding `runtime.params` key is not set.
156
+
Without `state_location`, rate limits are local to each Spice instance. When `state_location` is set, Spice instances coordinate through object storage so that a configured limit is shared across the cluster. For example, `requests_per_second_limit: 20` means approximately 20 RPS total across all replicas, not 20 RPS per replica.
| `state_location` | Yes | - | Root URI for globally persisted rate-control state (e.g. `s3://bucket/path/`). Enables cluster-wide rate control when set. Without this, limits are local to each Spice instance. |
173
+
| `params` | Yes | - | Object-store authentication parameters for `state_location`. Supports the same keys as other object-store configurations (e.g. `s3_region`, `s3_key`, `s3_secret` for S3; `account`, `access_key` for Azure). Supports `${ secrets:NAME }` references. |
174
+
| `refresh_interval` | Yes | `30s` | How often each instance refreshes and persists per-source rate-control state. Longer intervals reduce object-store writes but adapt more slowly to demand changes. |
175
+
| `github_concurrent_connections_limit` | Yes | `10` | Maximum number of concurrent GitHub HTTP requests per authentication context. Replaces the deprecated `runtime.params.github_max_concurrent_connections`. |
176
+
177
+
HTTP/API rate limits are configured through [`runtime.params`](#runtimeparams) (cluster defaults) and per-dataset overrides. Precedence is:
When `state_location` is set, the configured RPS/RPM quota is converted into a token budget per lease window and distributed across replicas using a demand-weighted leased token-bucket model.
184
+
174
185
## `runtime.functions`
175
186
176
187
Controls whether [functions](../../features/functions) declared in the top-level `functions:` section (and `tools:` entries with `as_sql: true`) are registered with the SQL engine. Defaults to disabled.
0 commit comments