Skip to content

Commit 1d4cf2e

Browse files
lukekimkczimmphillipleblanc
authored
Updates for v1.1.1 release (#934)
* add s3_session_token param (#933) * Add docs for `runtime.results_cache.cache_key_type` (#932) * Add docs for cache_key_type * Upd docs * Luke updates * Fix spice sql * Add docs for component metrics & new mysql connection parameters (#937) * Update acknowledgments for 1.1.1 (#939) --------- Co-authored-by: Kevin Zimmerman <4733573+kczimm@users.noreply.github.com> Co-authored-by: Phillip LeBlanc <phillip@spiceai.io>
1 parent dd20c5c commit 1d4cf2e

8 files changed

Lines changed: 225 additions & 47 deletions

File tree

website/docs/acknowledgements/index.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -333,7 +333,7 @@ gopkg.in/yaml.v3, https://github.com/go-yaml/yaml/blob/v3.0.1/LICENSE, MIT
333333
- moka 0.12.10, Apache-2.0 OR MIT
334334
<br/>https://github.com/moka-rs/moka
335335

336-
- mysql_async 0.34.2, Apache-2.0 OR MIT
336+
- mysql_async 0.35.1, Apache-2.0 OR MIT
337337
<br/>https://github.com/blackbeam/mysql_async
338338

339339
- ndarray 0.15.6, Apache-2.0 OR MIT
@@ -387,6 +387,9 @@ gopkg.in/yaml.v3, https://github.com/go-yaml/yaml/blob/v3.0.1/LICENSE, MIT
387387
- path-clean 1.0.1, Apache-2.0 OR MIT
388388
<br/>https://github.com/danreeves/path-clean
389389

390+
- pdf-extract 0.9.0, MIT
391+
<br/>https://github.com/jrmuizel/pdf-extract
392+
390393
- percent-encoding 2.3.1, Apache-2.0 OR MIT
391394
<br/>https://github.com/servo/rust-url/
392395

website/docs/components/data-connectors/mysql.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ tags:
66
- data-connectors
77
- mysql
88
- relational
9+
- component-metrics
910
---
1011

1112
MySQL is an open-source relational database management system that uses structured query language (SQL) for managing and manipulating databases.
@@ -22,6 +23,8 @@ datasets:
2223
mysql_db: my_database
2324
mysql_user: my_user
2425
mysql_pass: ${secrets:mysql_pass}
26+
mysql_pool_min: 10
27+
mysql_pool_max: 100
2528
```
2629
2730
## Configuration
@@ -89,6 +92,51 @@ The MySQL data connector can be configured by providing the following `params`.
8992
| `mysql_pass` | The password to connect with. |
9093
| `mysql_sslmode` | Optional. Specifies the SSL/TLS behavior for the connection, supported values:<br /> <ul><li>`required`: (default) This mode requires an SSL connection. If a secure connection cannot be established, server will not connect.</li><li>`preferred`: This mode will try to establish a secure SSL connection if possible, but will connect insecurely if the server does not support SSL.</li><li>`disabled`: This mode will not attempt to use an SSL connection, even if the server supports it.</li></ul> |
9194
| `mysql_sslrootcert` | Optional parameter specifying the path to a custom PEM certificate that the connector will trust. |
95+
| `mysql_pool_min` | The minimum number of connections to keep open in the pool, lazily created when requested. Default: `10` |
96+
| `mysql_pool_max` | The maximum number of connections to allow in the pool. Default: `100` |
97+
98+
### `metrics`
99+
100+
The MySQL data connector supports the following optional [component metrics](/docs/features/observability/component_metrics):
101+
102+
| Metric Name | Type | Description |
103+
| ----------- | ---- | ----------- |
104+
| `connection_count` | Gauge | Gauge of active connections to the database server |
105+
| `connections_in_pool` | Gauge | Gauge of active connections that are idling in the pool |
106+
| `active_wait_requests` | Gauge | Gauge of requests that are waiting for a connection to be returned to the pool |
107+
| `create_failed` | Counter | Counter of connections that failed to be created |
108+
| `discarded_superfluous_connection` | Counter | Counter of connections that were closed because there were already enough idle connections in the pool |
109+
| `discarded_unestablished_connection` | Counter | Counter of connections that were closed because they could not be established |
110+
| `dirty_connection_return` | Counter | Counter of connections that were returned to the pool but were dirty (ie. open transactions, pending queries, etc) |
111+
| `discarded_expired_connection` | Counter | Counter of connections that were discarded because they were expired by the pool constraints (i.e. TTL expired) |
112+
| `resetting_connection` | Counter | Counter of connections that were reset |
113+
| `discarded_error_during_cleanup` | Counter | Counter of connections that were discarded because they returned an error during cleanup |
114+
| `connection_returned_to_pool` | Counter | Counter of connections that were returned to the pool |
115+
116+
These metrics are not enabled by default, enable them by setting the `metrics` parameter:
117+
118+
```yaml
119+
datasets:
120+
- from: mysql:mytable
121+
name: my_dataset
122+
metrics:
123+
- name: connection_count
124+
- name: connections_in_pool
125+
- name: active_wait_requests
126+
- name: create_failed
127+
- name: discarded_superfluous_connection
128+
- name: discarded_unestablished_connection
129+
- name: dirty_connection_return
130+
- name: discarded_expired_connection
131+
- name: resetting_connection
132+
- name: discarded_error_during_cleanup
133+
- name: connection_returned_to_pool
134+
params: &params
135+
mysql_host: localhost
136+
mysql_tcp_port: 3306
137+
mysql_user: my_user
138+
mysql_pass: ${secrets:mysql_pass}
139+
```
92140

93141
## Types
94142

@@ -186,6 +234,22 @@ datasets:
186234
mysql_pass: ${secrets:mysql_pass}
187235
```
188236

237+
### With custom connection pool settings
238+
239+
```yaml
240+
datasets:
241+
- from: mysql:path.to.my_dataset
242+
name: my_dataset
243+
params:
244+
mysql_host: localhost
245+
mysql_tcp_port: 3306
246+
mysql_db: my_database
247+
mysql_user: my_user
248+
mysql_pass: ${secrets:mysql_pass}
249+
mysql_pool_min: 5
250+
mysql_pool_max: 10
251+
```
252+
189253
## Secrets
190254

191255
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/docs/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/docs/components/secret-stores#using-secrets).

website/docs/components/data-connectors/s3.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ SELECT COUNT(*) FROM cool_dataset;
6464
| `s3_auth` | Authentication type. Options: `public`, `key` and `iam_role`. Defaults to `public`. |
6565
| `s3_key` | Access key (e.g. `AWS_ACCESS_KEY_ID` for AWS) |
6666
| `s3_secret` | Secret key (e.g. `AWS_SECRET_ACCESS_KEY` for AWS) |
67+
| `s3_session_token` | Session token (e.g. `AWS_SESSION_TOKEN` for AWS) for temporary credentials |
6768
| `allow_http` | Enables insecure HTTP connections to `s3_endpoint`. Defaults to `false`. |
6869
| `schema_source_path` | Specifies the URL used to infer the dataset schema. Default to the most recently modified file |
6970

website/docs/features/caching/index.md

Lines changed: 36 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,11 @@ tags:
1111
- cache control
1212
---
1313

14-
Spice supports in-memory caching of query results, which is enabled by default for both the HTTP (`/v1/sql`) and Arrow Flight APIs.
14+
Spice uses in-memory caching for query results, which is enabled by default for both the HTTP (`/v1/sql`) and Arrow Flight APIs.
1515

16-
Results caching can help improve performance for bursts of requests and for non-accelerated results such as refresh data returned [on zero results](/docs/features/data-acceleration/data-refresh.md#behavior-on-zero-results).
16+
Results caching improves performance for repeated requests and non-accelerated results, such as refresh data returned [on zero results](/docs/features/data-acceleration/data-refresh.md#behavior-on-zero-results).
1717

18-
Results caching employs a [least-recently-used (LRU)](https://en.wikipedia.org/wiki/Cache_replacement_policies#LRU) cache replacement policy with the ability to specify an item expiry duration, which defaults to 1-second.
18+
The cache uses a [least-recently-used (LRU)](https://en.wikipedia.org/wiki/Cache_replacement_policies#LRU) replacement policy. You can configure the cache to set an item expiration duration, which defaults to 1 second.
1919

2020
```yaml
2121
version: v1
@@ -26,29 +26,38 @@ runtime:
2626
results_cache:
2727
enabled: true
2828
cache_max_size: 128MiB
29-
eviction_policy: lru
3029
item_ttl: 1s
3130
```
3231
33-
| Parameter name | Optional | Description |
34-
| ----------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------- |
35-
| `enabled` | Yes | `true` by default |
36-
| `cache_max_size` | Yes | Maximum cache size. Default is `128MiB` |
37-
| `eviction_policy` | Yes | Cache replacement policy when the cached data reaches the `cache_max_size`. Default and only currently supported value is `lru` |
38-
| `item_ttl` | Yes | Cache entry expiration duration (Time to Live), 1 second by default. |
32+
| Parameter name | Optional | Description |
33+
| ----------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
34+
| `enabled` | Yes | Defaults to `true`. |
35+
| `cache_max_size` | Yes | Maximum cache size. Defaults to `128MiB`. |
36+
| `eviction_policy` | Yes | Cache replacement policy when the cache reaches `cache_max_size`. Defaults to `lru`, which is currently the only supported value. |
37+
| `item_ttl` | Yes | Cache entry expiration duration (Time to Live). Defaults to 1 second. |
38+
| `cache_key_type` | Yes | Determines how cache keys are generated. Defaults to `plan`. `plan` uses the query's logical plan, while `sql` uses the raw SQL query string. |
3939

40-
## Cached responses
40+
### Choosing a `cache_key_type`
4141

42-
The response includes a `Results-Cache-Status` header indicating the cache status of the query:
42+
- **`plan` (Default):** Uses the query's logical plan as the cache key. Matches semantically equivalent queries but requires query parsing.
43+
- **`sql`:** Uses the raw SQL string as the cache key. Provides faster lookups but requires exact string matches. Queries with dynamic functions, such as `NOW()`, may produce unexpected results. Use `sql` only when results are predictable.
4344

44-
| Header value | Description |
45-
| ----------- | ----------- |
46-
| `HIT` | The query result was served from cache |
47-
| `MISS` | The cache was checked but the result was not found |
48-
| `BYPASS` | The cache was bypassed for this query. (e.g., when `cache-control: no-cache` is specified) |
49-
| _header not present_ | Results cache did not apply to this query. (e.g. when results cache is disabled or querying a system table) |
45+
Use `sql` for the lowest latency with identical queries that do not include dynamic functions. Use `plan` for greater flexibility.
5046

51-
Example cached response:
47+
## Cached Responses
48+
49+
The response includes a `Results-Cache-Status` header that indicates the cache status of the query:
50+
51+
| Header value | Description |
52+
| -------------------- | -------------------------------------------------------------------------------------------------- |
53+
| `HIT` | The query result was served from the cache. |
54+
| `MISS` | The cache was checked, but the result was not found. |
55+
| `BYPASS` | The cache was bypassed for this query (e.g., when `cache-control: no-cache` is specified). |
56+
| _header not present_ | The cache did not apply to this query (e.g., when caching is disabled or querying a system table). |
57+
58+
### Examples
59+
60+
#### Cached Response
5261

5362
```bash
5463
$ curl -XPOST -i http://localhost:8090/v1/sql -d 'select * from taxi_trips limit 1;'
@@ -60,7 +69,7 @@ content-length: 416
6069
date: Thu, 13 Feb 2025 03:05:39 GMT
6170
```
6271

63-
Example uncached response:
72+
#### Uncached Response
6473

6574
```bash
6675
$ curl -XPOST -i http://localhost:8090/v1/sql -d 'select * from taxi_trips limit 1;'
@@ -72,7 +81,7 @@ content-length: 416
7281
date: Thu, 13 Feb 2025 03:13:19 GMT
7382
```
7483

75-
Example uncached response with `cache-control: no-cache`:
84+
#### Bypassed Cache with `cache-control: no-cache`
7685

7786
```bash
7887
$ curl -H "cache-control: no-cache" -XPOST -i http://localhost:8090/v1/sql -d 'select * from taxi_trips limit 1;'
@@ -86,18 +95,18 @@ date: Thu, 13 Feb 2025 03:14:00 GMT
8695

8796
## Cache Control
8897

89-
The results cache behavior can be controlled for specific queries through HTTP headers. The `Cache-Control` header can be used to skip the cache for a specific query, but still cache the results for subsequent queries.
98+
You can control caching behavior for specific queries using HTTP headers. The `Cache-Control` header helps skip the cache for a query while caching the results for subsequent queries.
9099

91100
### HTTP/Flight API
92101

93-
The SQL query API endpoints (both HTTP and Arrow Flight) understand the standard HTTP [`Cache-Control` header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control), supporting the [`no-cache` directive](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#no-cache). When `no-cache` is specified, the cache is not used for the current query, but the query results are cached for subsequent queries.
102+
The SQL query API endpoints (HTTP and Arrow Flight) support the standard HTTP [`Cache-Control` header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control). The [`no-cache` directive](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#no-cache) skips the cache for the current query but caches the results for future queries.
94103

95-
`Cache-Control` directives other than `no-cache` are not supported.
104+
Other `Cache-Control` directives are not supported.
96105

97106
#### HTTP Example
98107

99108
```bash
100-
# Default behavior (use cache)
109+
# Default behavior (uses cache)
101110
curl -XPOST http://localhost:8090/v1/sql -d 'SELECT 1'
102111
103112
# Skip cache for this query, but cache the results for future queries
@@ -106,7 +115,7 @@ curl -H "cache-control: no-cache" -XPOST http://localhost:8090/v1/sql -d 'SELECT
106115

107116
#### Arrow FlightSQL Example
108117

109-
The following example shows how to skip the cache for a specific query using FlightSQL in Rust:
118+
The following example skips the cache for a specific query using FlightSQL in Rust:
110119

111120
```rust
112121
let sql_command = arrow_flight::sql::CommandStatementQuery {
@@ -126,15 +135,13 @@ request
126135

127136
### `spice sql` CLI
128137

129-
The `spice sql` command accepts a `--cache-control` flag that follows the same behavior as the HTTP header:
138+
The `spice sql` command accepts a `--cache-control` flag that follows the same behavior as the HTTP `Cache-Control` header:
130139

131140
```bash
132141
# Default behavior (use cache if available)
133142
spice sql
134-
135143
# Same as above
136144
spice sql --cache-control cache
137-
138145
# Skip cache for this query, but cache the results for future queries
139146
spice sql --cache-control no-cache
140147
```
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
---
2+
title: 'Component Metrics'
3+
sidebar_label: 'Component Metrics'
4+
description: 'Learn how to enable optional component metrics.'
5+
sidebar_position: 1
6+
pagination_prev: null
7+
pagination_next: null
8+
tags:
9+
- component-metrics
10+
---
11+
12+
Component metrics provide detailed insights into the internal state and performance of individual components in Spice. Each component can expose its own set of metrics that can be enabled selectively to monitor specific aspects of its operation.
13+
14+
## Enabling Component Metrics
15+
16+
Component metrics are disabled by default and can be enabled by adding a `metrics` section to the component configuration. Each metric can be enabled individually by specifying its name in the metrics list.
17+
18+
### Example Configuration
19+
20+
```yaml
21+
datasets:
22+
- from: some_component:my_resource
23+
name: my_resource
24+
metrics:
25+
- name: metric_one
26+
enabled: true
27+
- name: metric_two
28+
enabled: true
29+
- name: metric_three
30+
enabled: false
31+
params:
32+
param_one: value_one
33+
param_two: value_two
34+
```
35+
36+
## Available Metrics
37+
38+
Each component defines its own set of available metrics. These metrics are exposed in the Prometheus format with the following naming convention:
39+
40+
```text
41+
{component_type}_{component_name}_{metric_name}
42+
```
43+
44+
For example, a MySQL dataset component's metrics would be prefixed with `dataset_mysql_`.
45+
46+
## Monitoring Component Metrics
47+
48+
Component metrics are exposed through the same Prometheus-compatible metrics endpoint as other Spice metrics. These metrics can be accessed using standard Prometheus tools or any monitoring system that supports Prometheus metrics.
49+
50+
To view the metrics, make a GET request to the metrics endpoint:
51+
52+
```bash
53+
curl http://localhost:9090/metrics
54+
```
55+
56+
The response will include all enabled component metrics in Prometheus format, with proper HELP and TYPE annotations.
57+
58+
## Component-Specific Metrics
59+
60+
For detailed information about metrics available for specific components, view all [components that expose metrics](/docs/tags/component-metrics).

website/docs/features/observability/index.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,3 +110,9 @@ dataset_active_count{engine="duckdb"} 1
110110
| `tool_load_state`<br/>*(gauge)* | Status of the LLM tools. 1=Initializing, 2=Ready, 3=Disabled, 4=Error, 5=Refreshing. |
111111
| `view_load_errors`<br/>*(count)* | Number of errors loading the view. |
112112
| `view_load_state`<br/>*(gauge)* | Status of the views. 1=Initializing, 2=Ready, 3=Disabled, 4=Error, 5=Refreshing. |
113+
114+
:::note Component Metrics
115+
116+
In addition to these core metrics, individual components can expose their own metrics. For example, the MySQL data connector exposes [connection pool metrics](/docs/components/data-connectors/mysql/#metrics). See [Component Metrics](/docs/features/observability/component_metrics) for more information.
117+
118+
:::

0 commit comments

Comments
 (0)