Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion website/docs/acknowledgements/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,7 @@ gopkg.in/yaml.v3, https://github.com/go-yaml/yaml/blob/v3.0.1/LICENSE, MIT
- moka 0.12.10, Apache-2.0 OR MIT
<br/>https://github.com/moka-rs/moka

- mysql_async 0.34.2, Apache-2.0 OR MIT
- mysql_async 0.35.1, Apache-2.0 OR MIT
<br/>https://github.com/blackbeam/mysql_async

- ndarray 0.15.6, Apache-2.0 OR MIT
Expand Down Expand Up @@ -387,6 +387,9 @@ gopkg.in/yaml.v3, https://github.com/go-yaml/yaml/blob/v3.0.1/LICENSE, MIT
- path-clean 1.0.1, Apache-2.0 OR MIT
<br/>https://github.com/danreeves/path-clean

- pdf-extract 0.9.0, MIT
<br/>https://github.com/jrmuizel/pdf-extract

- percent-encoding 2.3.1, Apache-2.0 OR MIT
<br/>https://github.com/servo/rust-url/

Expand Down
64 changes: 64 additions & 0 deletions website/docs/components/data-connectors/mysql.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ tags:
- data-connectors
- mysql
- relational
- component-metrics
---

MySQL is an open-source relational database management system that uses structured query language (SQL) for managing and manipulating databases.
Expand All @@ -22,6 +23,8 @@ datasets:
mysql_db: my_database
mysql_user: my_user
mysql_pass: ${secrets:mysql_pass}
mysql_pool_min: 10
mysql_pool_max: 100
```

## Configuration
Expand Down Expand Up @@ -89,6 +92,51 @@ The MySQL data connector can be configured by providing the following `params`.
| `mysql_pass` | The password to connect with. |
| `mysql_sslmode` | Optional. Specifies the SSL/TLS behavior for the connection, supported values:<br /> <ul><li>`required`: (default) This mode requires an SSL connection. If a secure connection cannot be established, server will not connect.</li><li>`preferred`: This mode will try to establish a secure SSL connection if possible, but will connect insecurely if the server does not support SSL.</li><li>`disabled`: This mode will not attempt to use an SSL connection, even if the server supports it.</li></ul> |
| `mysql_sslrootcert` | Optional parameter specifying the path to a custom PEM certificate that the connector will trust. |
| `mysql_pool_min` | The minimum number of connections to keep open in the pool, lazily created when requested. Default: `10` |
| `mysql_pool_max` | The maximum number of connections to allow in the pool. Default: `100` |

### `metrics`

The MySQL data connector supports the following optional [component metrics](/docs/features/observability/component_metrics):

| Metric Name | Type | Description |
| ----------- | ---- | ----------- |
| `connection_count` | Gauge | Gauge of active connections to the database server |
| `connections_in_pool` | Gauge | Gauge of active connections that are idling in the pool |
| `active_wait_requests` | Gauge | Gauge of requests that are waiting for a connection to be returned to the pool |
| `create_failed` | Counter | Counter of connections that failed to be created |
| `discarded_superfluous_connection` | Counter | Counter of connections that were closed because there were already enough idle connections in the pool |
| `discarded_unestablished_connection` | Counter | Counter of connections that were closed because they could not be established |
| `dirty_connection_return` | Counter | Counter of connections that were returned to the pool but were dirty (ie. open transactions, pending queries, etc) |
| `discarded_expired_connection` | Counter | Counter of connections that were discarded because they were expired by the pool constraints (i.e. TTL expired) |
| `resetting_connection` | Counter | Counter of connections that were reset |
| `discarded_error_during_cleanup` | Counter | Counter of connections that were discarded because they returned an error during cleanup |
| `connection_returned_to_pool` | Counter | Counter of connections that were returned to the pool |

These metrics are not enabled by default, enable them by setting the `metrics` parameter:

```yaml
datasets:
- from: mysql:mytable
name: my_dataset
metrics:
- name: connection_count
- name: connections_in_pool
- name: active_wait_requests
- name: create_failed
- name: discarded_superfluous_connection
- name: discarded_unestablished_connection
- name: dirty_connection_return
- name: discarded_expired_connection
- name: resetting_connection
- name: discarded_error_during_cleanup
- name: connection_returned_to_pool
params: &params
mysql_host: localhost
mysql_tcp_port: 3306
mysql_user: my_user
mysql_pass: ${secrets:mysql_pass}
```

## Types

Expand Down Expand Up @@ -186,6 +234,22 @@ datasets:
mysql_pass: ${secrets:mysql_pass}
```

### With custom connection pool settings

```yaml
datasets:
- from: mysql:path.to.my_dataset
name: my_dataset
params:
mysql_host: localhost
mysql_tcp_port: 3306
mysql_db: my_database
mysql_user: my_user
mysql_pass: ${secrets:mysql_pass}
mysql_pool_min: 5
mysql_pool_max: 10
```

## Secrets

Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/docs/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/docs/components/secret-stores#using-secrets).
Expand Down
1 change: 1 addition & 0 deletions website/docs/components/data-connectors/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ SELECT COUNT(*) FROM cool_dataset;
| `s3_auth` | Authentication type. Options: `public`, `key` and `iam_role`. Defaults to `public`. |
| `s3_key` | Access key (e.g. `AWS_ACCESS_KEY_ID` for AWS) |
| `s3_secret` | Secret key (e.g. `AWS_SECRET_ACCESS_KEY` for AWS) |
| `s3_session_token` | Session token (e.g. `AWS_SESSION_TOKEN` for AWS) for temporary credentials |
| `allow_http` | Enables insecure HTTP connections to `s3_endpoint`. Defaults to `false`. |
| `schema_source_path` | Specifies the URL used to infer the dataset schema. Default to the most recently modified file |

Expand Down
65 changes: 36 additions & 29 deletions website/docs/features/caching/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ tags:
- cache control
---

Spice supports in-memory caching of query results, which is enabled by default for both the HTTP (`/v1/sql`) and Arrow Flight APIs.
Spice uses in-memory caching for query results, which is enabled by default for both the HTTP (`/v1/sql`) and Arrow Flight APIs.

Results caching can help improve performance for bursts of requests and for non-accelerated results such as refresh data returned [on zero results](/docs/features/data-acceleration/data-refresh.md#behavior-on-zero-results).
Results caching improves performance for repeated requests and non-accelerated results, such as refresh data returned [on zero results](/docs/features/data-acceleration/data-refresh.md#behavior-on-zero-results).

Results caching employs a [least-recently-used (LRU)](https://en.wikipedia.org/wiki/Cache_replacement_policies#LRU) cache replacement policy with the ability to specify an item expiry duration, which defaults to 1-second.
The cache uses a [least-recently-used (LRU)](https://en.wikipedia.org/wiki/Cache_replacement_policies#LRU) replacement policy. You can configure the cache to set an item expiration duration, which defaults to 1 second.

```yaml
version: v1
Expand All @@ -26,29 +26,38 @@ runtime:
results_cache:
enabled: true
cache_max_size: 128MiB
eviction_policy: lru
item_ttl: 1s
```

| Parameter name | Optional | Description |
| ----------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `enabled` | Yes | `true` by default |
| `cache_max_size` | Yes | Maximum cache size. Default is `128MiB` |
| `eviction_policy` | Yes | Cache replacement policy when the cached data reaches the `cache_max_size`. Default and only currently supported value is `lru` |
| `item_ttl` | Yes | Cache entry expiration duration (Time to Live), 1 second by default. |
| Parameter name | Optional | Description |
| ----------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| `enabled` | Yes | Defaults to `true`. |
| `cache_max_size` | Yes | Maximum cache size. Defaults to `128MiB`. |
| `eviction_policy` | Yes | Cache replacement policy when the cache reaches `cache_max_size`. Defaults to `lru`, which is currently the only supported value. |
| `item_ttl` | Yes | Cache entry expiration duration (Time to Live). Defaults to 1 second. |
| `cache_key_type` | Yes | Determines how cache keys are generated. Defaults to `plan`. `plan` uses the query's logical plan, while `sql` uses the raw SQL query string. |

## Cached responses
### Choosing a `cache_key_type`

The response includes a `Results-Cache-Status` header indicating the cache status of the query:
- **`plan` (Default):** Uses the query's logical plan as the cache key. Matches semantically equivalent queries but requires query parsing.
- **`sql`:** Uses the raw SQL string as the cache key. Provides faster lookups but requires exact string matches. Queries with dynamic functions, such as `NOW()`, may produce unexpected results. Use `sql` only when results are predictable.

| Header value | Description |
| ----------- | ----------- |
| `HIT` | The query result was served from cache |
| `MISS` | The cache was checked but the result was not found |
| `BYPASS` | The cache was bypassed for this query. (e.g., when `cache-control: no-cache` is specified) |
| _header not present_ | Results cache did not apply to this query. (e.g. when results cache is disabled or querying a system table) |
Use `sql` for the lowest latency with identical queries that do not include dynamic functions. Use `plan` for greater flexibility.

Example cached response:
## Cached Responses

The response includes a `Results-Cache-Status` header that indicates the cache status of the query:

| Header value | Description |
| -------------------- | -------------------------------------------------------------------------------------------------- |
| `HIT` | The query result was served from the cache. |
| `MISS` | The cache was checked, but the result was not found. |
| `BYPASS` | The cache was bypassed for this query (e.g., when `cache-control: no-cache` is specified). |
| _header not present_ | The cache did not apply to this query (e.g., when caching is disabled or querying a system table). |

### Examples

#### Cached Response

```bash
$ curl -XPOST -i http://localhost:8090/v1/sql -d 'select * from taxi_trips limit 1;'
Expand All @@ -60,7 +69,7 @@ content-length: 416
date: Thu, 13 Feb 2025 03:05:39 GMT
```

Example uncached response:
#### Uncached Response

```bash
$ curl -XPOST -i http://localhost:8090/v1/sql -d 'select * from taxi_trips limit 1;'
Expand All @@ -72,7 +81,7 @@ content-length: 416
date: Thu, 13 Feb 2025 03:13:19 GMT
```

Example uncached response with `cache-control: no-cache`:
#### Bypassed Cache with `cache-control: no-cache`

```bash
$ curl -H "cache-control: no-cache" -XPOST -i http://localhost:8090/v1/sql -d 'select * from taxi_trips limit 1;'
Expand All @@ -86,18 +95,18 @@ date: Thu, 13 Feb 2025 03:14:00 GMT

## Cache Control

The results cache behavior can be controlled for specific queries through HTTP headers. The `Cache-Control` header can be used to skip the cache for a specific query, but still cache the results for subsequent queries.
You can control caching behavior for specific queries using HTTP headers. The `Cache-Control` header helps skip the cache for a query while caching the results for subsequent queries.

### HTTP/Flight API

The SQL query API endpoints (both HTTP and Arrow Flight) understand the standard HTTP [`Cache-Control` header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control), supporting the [`no-cache` directive](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#no-cache). When `no-cache` is specified, the cache is not used for the current query, but the query results are cached for subsequent queries.
The SQL query API endpoints (HTTP and Arrow Flight) support the standard HTTP [`Cache-Control` header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control). The [`no-cache` directive](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#no-cache) skips the cache for the current query but caches the results for future queries.

`Cache-Control` directives other than `no-cache` are not supported.
Other `Cache-Control` directives are not supported.

#### HTTP Example

```bash
# Default behavior (use cache)
# Default behavior (uses cache)
curl -XPOST http://localhost:8090/v1/sql -d 'SELECT 1'

# Skip cache for this query, but cache the results for future queries
Expand All @@ -106,7 +115,7 @@ curl -H "cache-control: no-cache" -XPOST http://localhost:8090/v1/sql -d 'SELECT

#### Arrow FlightSQL Example

The following example shows how to skip the cache for a specific query using FlightSQL in Rust:
The following example skips the cache for a specific query using FlightSQL in Rust:

```rust
let sql_command = arrow_flight::sql::CommandStatementQuery {
Expand All @@ -126,15 +135,13 @@ request

### `spice sql` CLI

The `spice sql` command accepts a `--cache-control` flag that follows the same behavior as the HTTP header:
The `spice sql` command accepts a `--cache-control` flag that follows the same behavior as the HTTP `Cache-Control` header:

```bash
# Default behavior (use cache if available)
spice sql

# Same as above
spice sql --cache-control cache

# Skip cache for this query, but cache the results for future queries
spice sql --cache-control no-cache
```
60 changes: 60 additions & 0 deletions website/docs/features/observability/component_metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
title: 'Component Metrics'
sidebar_label: 'Component Metrics'
description: 'Learn how to enable optional component metrics.'
sidebar_position: 1
pagination_prev: null
pagination_next: null
tags:
- component-metrics
---

Component metrics provide detailed insights into the internal state and performance of individual components in Spice. Each component can expose its own set of metrics that can be enabled selectively to monitor specific aspects of its operation.

## Enabling Component Metrics

Component metrics are disabled by default and can be enabled by adding a `metrics` section to the component configuration. Each metric can be enabled individually by specifying its name in the metrics list.

### Example Configuration

```yaml
datasets:
- from: some_component:my_resource
name: my_resource
metrics:
- name: metric_one
enabled: true
- name: metric_two
enabled: true
- name: metric_three
enabled: false
params:
param_one: value_one
param_two: value_two
```

## Available Metrics

Each component defines its own set of available metrics. These metrics are exposed in the Prometheus format with the following naming convention:

```text
{component_type}_{component_name}_{metric_name}
```

For example, a MySQL dataset component's metrics would be prefixed with `dataset_mysql_`.

## Monitoring Component Metrics

Component metrics are exposed through the same Prometheus-compatible metrics endpoint as other Spice metrics. These metrics can be accessed using standard Prometheus tools or any monitoring system that supports Prometheus metrics.

To view the metrics, make a GET request to the metrics endpoint:

```bash
curl http://localhost:9090/metrics
```

The response will include all enabled component metrics in Prometheus format, with proper HELP and TYPE annotations.

## Component-Specific Metrics

For detailed information about metrics available for specific components, view all [components that expose metrics](/docs/tags/component-metrics).
6 changes: 6 additions & 0 deletions website/docs/features/observability/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,9 @@ dataset_active_count{engine="duckdb"} 1
| `tool_load_state`<br/>*(gauge)* | Status of the LLM tools. 1=Initializing, 2=Ready, 3=Disabled, 4=Error, 5=Refreshing. |
| `view_load_errors`<br/>*(count)* | Number of errors loading the view. |
| `view_load_state`<br/>*(gauge)* | Status of the views. 1=Initializing, 2=Ready, 3=Disabled, 4=Error, 5=Refreshing. |

:::note Component Metrics

In addition to these core metrics, individual components can expose their own metrics. For example, the MySQL data connector exposes [connection pool metrics](/docs/components/data-connectors/mysql/#metrics). See [Component Metrics](/docs/features/observability/component_metrics) for more information.

:::
Loading
Loading