v1.9.0 docs (#1237)

phillipleblanc · lukekim · peasee · web-flow · commit 0283a3c42f76 · 2025-11-19T22:45:42.000+09:00
* Results cache docs for zstd compression (#1234) * Results cache docs for zstd compression * Add limitation * docs: Fix reference to stale while revalidate ttl param --------- Co-authored-by: peasee <98815791+peasee@users.noreply.github.com> Co-authored-by: Phillip LeBlanc <phillip@spice.ai> * Add docs for distributed query (#1236) * Add docs page for distributed query * Clarify the spicepod requirements * tweaks * Update website/docs/features/distributed-query/index.md Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com> --------- Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com> --------- Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com> Co-authored-by: peasee <98815791+peasee@users.noreply.github.com>
diff --git a/website/docs/features/caching/index.md b/website/docs/features/caching/index.md
diff --git a/website/docs/features/cdc/index.md b/website/docs/features/cdc/index.md
@@ -2,7 +2,7 @@
 title: 'Change Data Capture (CDC)'
 sidebar_label: 'Change Data Capture'
 description: 'Learn how to use Change Data Capture (CDC) in Spice.'
-sidebar_position: 4
+sidebar_position: 5
 pagination_prev: null
 pagination_next: null
 ---
diff --git a/website/docs/features/data-ingestion/index.md b/website/docs/features/data-ingestion/index.md
@@ -2,7 +2,7 @@
 title: 'Data Ingestion'
 sidebar_label: 'Data Ingestion'
 description: 'Learn how to ingest data in Spice.'
-sidebar_position: 5
+sidebar_position: 6
 pagination_prev: null
 pagination_next: null
 tags:
diff --git a/website/docs/features/distributed-query/index.md b/website/docs/features/distributed-query/index.md
@@ -0,0 +1,65 @@
+---
+title: 'Distributed Query'
+sidebar_label: 'Distributed Query'
+description: 'Learn how to run Spice in distributed mode for larger scale queries.'
+sidebar_position: 4
+pagination_prev: null
+pagination_next: null
+---
+
+Learn how to configure and run Spice in distributed mode to handle larger scale queries across multiple nodes.
+
+:::info Preview
+Multi-node distributed query execution based on Apache Ballista is available as a preview feature in Spice `v1.9.0`.
+:::
+
+## Overview
+
+Spice integrates [Apache Ballista](https://github.com/apache/datafusion-ballista) to schedule and coordinate distributed queries across multiple executor nodes. This integration enables distributed execution when running large queries over partitioned data lake formats such as Parquet, Delta Lake, or Iceberg.
+
+## Architecture
+
+A distributed Spice cluster consists of two components:
+
+- **Scheduler** – Plans distributed queries and manages the work queue for the executor fleet. Single instance per cluster.
+- **Executors** – One or more nodes responsible for executing physical query plans.
+
+The scheduler holds the cluster-wide configuration for a Spicepod, while executors connect to the scheduler to receive work.
+
+## Getting Started
+
+Cluster deployment typically starts with a scheduler instance, followed by one or more executors that register with the scheduler.
+
+### Start the Scheduler
+
+The scheduler is the only `spiced` process that needs to be configured (i.e. have a `spicepod.yaml` in the current dir). Override the Flight bind address when it must be reachable outside of `localhost`:
+
+```bash
+# Start scheduler
+spiced --cluster-mode scheduler --flight 0.0.0.0:50051
+```
+
+### Start Executors
+
+Executors need the scheduler's Flight URI to register and pull work. The executors do not require a `spicepod.yaml` to be present, it will fetch the configuration from the coordinator. Each executor automatically selects a free port if the default is unavailable:
+
+```bash
+# Start executor
+spiced --cluster-mode executor --scheduler-url spiced://localhost:50051
+```
+
+## Query Execution
+
+Queries run against the scheduler endpoint. The `EXPLAIN` output confirms that distributed planning is active—Spice includes a `distributed_plan` section showing how stages are split across executors:
+
+```sql
+EXPLAIN SELECT count(id) FROM my_dataset;
+```
+
+:::warning[Limitations]
+
+- Accelerated datasets are not yet supported; distributed query currently targets partitioned data lake sources.
+- As a preview feature, clusters may encounter stability or performance issues.
+- Accelerator support is planned for future releases; follow release notes for updates.
+
+:::
diff --git a/website/docs/features/embeddings/index.md b/website/docs/features/embeddings/index.md
@@ -2,7 +2,7 @@
 title: 'Embedding Datasets'
 sidebar_label: 'Embedding Datasets'
 description: 'Learn how to define, or augment existing datasets with embedding column(s).'
-sidebar_position: 7
+sidebar_position: 9
 pagination_prev: null
 pagination_next: null
 ---
diff --git a/website/docs/features/large-language-models/index.md b/website/docs/features/large-language-models/index.md
@@ -2,7 +2,7 @@
 title: 'Large Language Models'
 sidebar_label: 'Large Language Models'
 description: 'Learn how to configure large language models (LLMs)'
-sidebar_position: 5
+sidebar_position: 7
 pagination_prev: null
 pagination_next: null
 tags:
diff --git a/website/docs/features/machine-learning-models/index.md b/website/docs/features/machine-learning-models/index.md
@@ -1,7 +1,7 @@
 ---
 title: 'Machine Learning Models'
 sidebar_label: 'Machine Learning Models'
-sidebar_position: 6
+sidebar_position: 8
 pagination_prev: null
 pagination_next: null
 tags:
diff --git a/website/docs/features/observability/index.md b/website/docs/features/observability/index.md
@@ -2,7 +2,7 @@
 title: 'Observability & Monitoring'
 sidebar_label: 'Observability'
 description: 'Learn how to use Spice telemetry.'
-sidebar_position: 10
+sidebar_position: 12
 pagination_prev: null
 pagination_next: null
 ---
diff --git a/website/docs/features/search/index.md b/website/docs/features/search/index.md
@@ -2,7 +2,7 @@
 title: 'Search Functionality'
 sidebar_label: 'Search'
 description: 'Learn how Spice can search across datasets using database-native and vector-search methods.'
-sidebar_position: 8
+sidebar_position: 10
 pagination_prev: null
 pagination_next: null
 tags:
diff --git a/website/docs/features/semantic-model/index.md b/website/docs/features/semantic-model/index.md
@@ -2,7 +2,7 @@
 title: 'Semantic Model'
 sidebar_label: 'Semantic Model'
 description: 'Learn how to define and use semantic data models with Spice.'
-sidebar_position: 9
+sidebar_position: 11
 pagination_prev: null
 pagination_next: null
 ---
diff --git a/website/docs/features/web-search/index.md b/website/docs/features/web-search/index.md
@@ -2,6 +2,7 @@
 title: 'Web Search'
 sidebar_label: 'Web Search'
 description: 'Learn how Spice can perform web search'
+sidebar_position: 13
 tags:
   - search
   - models
diff --git a/website/docs/reference/spicepod/runtime.md b/website/docs/reference/spicepod/runtime.md
@@ -96,9 +96,11 @@ runtime:
 
 In addition to the common cache configuration parameters, `sql_results` also supports the following parameters:
 
-| Parameter name   | Optional | Description                                                                                                                                   |
-| ---------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
-| `cache_key_type` | Yes      | Determines how cache keys are generated. Defaults to `plan`. `plan` uses the query's logical plan, while `sql` uses the raw SQL query string. |
+| Parameter name   | Optional | Default | Description                                                                                                                                   |
+| ---------------- | -------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
+| `cache_key_type` | Yes      | `plan`  | Determines how cache keys are generated. Defaults to `plan`. `plan` uses the query's logical plan, while `sql` uses the raw SQL query string. |
+| `encoding`       | Yes      | `none`  | Compression algorithm for cached results. Defaults to `none`. Supports `none` or `zstd`.                                                      |
+| `stale_while_revalidate_ttl` | Yes      | `0s`      | Duration to serve stale cache entries while revalidating in the background. When set to a non-zero value, expired cache entries continue to be served while a background refresh occurs. Defaults to `0s` (disabled). |
 
 :::info
 
@@ -284,6 +286,7 @@ For detailed memory information, see [Memory](/docs/reference/memory.md).
 The `spill_compression` parameter configures compression for spill files generated during large query execution in the Spice runtime.
 
 **Supported values:**
+
 - `zstd` (default): Enables high compression ratios for spill files, reducing disk usage but with moderate (de)compression speed.
 - `lz4_frame`: Provides faster (de)compression, resulting in larger spill files and potentially higher disk usage.
 - `uncompressed`: Disables compression. Spill files will be the largest, but with no (de)compression overhead.
@@ -293,6 +296,7 @@ runtime:
   query:
     spill_compression: lz4_frame
 ```
+
 This option allows you to balance disk space usage and query performance for large-scale analytics workloads.
 
 ## `runtime.query.temp_directory`