diff --git a/website/docs/deployment/architectures/hybrid.md b/website/docs/deployment/architectures/cluster-sidecar.md
similarity index 75%
rename from website/docs/deployment/architectures/hybrid.md
rename to website/docs/deployment/architectures/cluster-sidecar.md
index e437717d3..ccfec8afe 100644
--- a/website/docs/deployment/architectures/hybrid.md
+++ b/website/docs/deployment/architectures/cluster-sidecar.md
@@ -1,12 +1,12 @@
---
-title: 'Hybrid Deployment'
-sidebar_label: 'Hybrid'
+title: 'Cluster-Sidecar Deployment'
+sidebar_label: 'Cluster-Sidecar'
description: 'Deploying Spice with sidecar caching backed by a centralized cluster for acceleration, distributed query, and ingestion.'
sidebar_position: 4
pagination_prev: null
pagination_next: null
---
-import Content from '@site/src/partials/deployment/architectures/_hybrid.mdx';
+import Content from '@site/src/partials/deployment/architectures/_cluster-sidecar.mdx';
diff --git a/website/docs/deployment/architectures/index.md b/website/docs/deployment/architectures/index.md
index 4a042edba..0d7ae4fd4 100644
--- a/website/docs/deployment/architectures/index.md
+++ b/website/docs/deployment/architectures/index.md
@@ -14,7 +14,7 @@ Spice supports multiple deployment architectures:
- [Sidecar Deployment](architectures/sidecar) - Deploy alongside applications
- [Microservice Deployment (Single or Multiple Replicas)](architectures/microservice) - Standalone service deployment
- [Tiered Deployment](architectures/tiered) - Edge, application, and cloud tiers
-- [Hybrid Deployment](architectures/hybrid) - Sidecar caching backed by a centralized cluster
+- [Cluster-Sidecar Deployment](architectures/cluster-sidecar) - Sidecar caching backed by a centralized cluster
- [Cloud-Hosted in the Spice Cloud Platform](architectures/hosted) - Managed cloud deployment
- [Sharded Deployment](architectures/sharded) - Horizontal data partitioning
- [Cluster Deployment (Spice.ai Enterprise)](architectures/cluster) - Distributed cluster architecture
diff --git a/website/docs/deployment/ci-cd/index.md b/website/docs/deployment/ci-cd/index.md
new file mode 100644
index 000000000..ec47db669
--- /dev/null
+++ b/website/docs/deployment/ci-cd/index.md
@@ -0,0 +1,239 @@
+---
+title: 'CI/CD Deployment'
+sidebar_label: 'CI/CD'
+sidebar_position: 7
+description: 'Deploy Spice.ai applications using continuous integration and delivery pipelines, including Helm, Kubernetes GitOps with Argo CD or Flux, GitHub Actions, and the Spice Cloud deploy action.'
+keywords:
+ [
+ spice.ai,
+ deployment,
+ ci/cd,
+ cicd,
+ helm,
+ kubernetes,
+ github actions,
+ gitops,
+ argo cd,
+ flux,
+ spicepod,
+ spice cloud,
+ ]
+tags:
+ - deployment
+ - helm
+ - kubernetes
+ - github
+ - gitops
+---
+
+Spice deployments can be automated through continuous integration and delivery (CI/CD) pipelines. The recommended approach for self-hosted, open-source deployments is the [Spice Helm chart](https://github.com/spiceai/helm-charts), driven either directly from a pipeline runner or declaratively through a GitOps controller. Container and cloud-VM workflows are also supported, as is a managed deploy action for the Spice Cloud Platform.
+
+The sections below cover, in order:
+
+- [Helm in CI pipelines](#helm-in-ci-pipelines) — push-based deployment from GitHub Actions, GitLab CI, or any runner.
+- [Kubernetes GitOps](#kubernetes-gitops) — pull-based reconciliation with Argo CD or Flux.
+- [Containers and cloud VMs](#containers-and-cloud-vms) — Docker, AWS, and Azure pipelines.
+- [Spice Cloud Platform](#spice-cloud-platform) — Connect Repository from the portal, or the `spicehq/spice-cloud-deploy-action` GitHub Action.
+
+:::tip Self-hosted enterprise deployments
+For production self-hosted deployments, the [Spice.ai Enterprise Kubernetes Operator](https://docs.spice.ai/docs/enterprise/kubernetes-operator/kubernetes) is the recommended approach. The operator provides per-replica StatefulSets, automatic PVC resizing, configurable update strategies, crashloop protection, and distributed query execution through `SpicepodSet` and `SpicepodCluster` custom resources, all reconcilable from Git through the same GitOps tooling described below.
+:::
+
+## Helm in CI pipelines
+
+The [Spice Helm chart](https://github.com/spiceai/helm-charts) is the primary deployment artifact for self-hosted clusters. Any CI runner with `kubectl` and `helm` installed can roll out a release by checking out the repository, authenticating to the target cluster, and running `helm upgrade --install`.
+
+The chart loads the Spicepod from a `spicepod` key in the values file. A typical layout keeps a single `values.yaml` that contains both chart configuration and the Spicepod definition:
+
+```yaml
+# values.yaml
+image:
+ repository: spiceai/spiceai
+ tag: '1.10.0'
+
+spicepod:
+ name: cayenne
+ version: v1
+ kind: Spicepod
+ datasets:
+ - from: s3://spiceai-demo-datasets/taxi_trips/2024/
+ name: taxi_trips
+ params:
+ file_format: parquet
+ acceleration:
+ enabled: true
+ engine: duckdb
+```
+
+For details on chart values, see the [Helm deployment guide](https://spiceai.org/docs/deployment/kubernetes/helm).
+
+### GitHub Actions example
+
+The workflow below deploys the chart to a Kubernetes cluster on every push to `main`. Cluster credentials are provided through a base64-encoded kubeconfig stored in the `KUBE_CONFIG` repository secret.
+
+```yaml
+name: Deploy Spice
+on:
+ push:
+ branches: [main]
+
+jobs:
+ deploy:
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout@v4
+
+ - uses: azure/setup-helm@v4
+ with:
+ version: v3.14.0
+
+ - name: Configure kubectl
+ run: |
+ mkdir -p "$HOME/.kube"
+ echo "${{ secrets.KUBE_CONFIG }}" | base64 -d > "$HOME/.kube/config"
+
+ - name: Deploy Spice
+ run: |
+ helm repo add spiceai https://helm.spiceai.org
+ helm repo update
+ helm upgrade --install spiceai spiceai/spiceai \
+ --namespace spiceai \
+ --create-namespace \
+ --values values.yaml \
+ --atomic \
+ --wait \
+ --timeout 5m
+```
+
+`--atomic` rolls back on failure, and `--wait` blocks until the release is healthy, so a failed deploy fails the pipeline.
+
+### GitLab CI example
+
+The same pattern works in GitLab CI. The job uses the official `alpine/helm` image and reads cluster credentials from a CI/CD variable.
+
+```yaml
+deploy:
+ image: alpine/helm:3.14.0
+ stage: deploy
+ before_script:
+ - apk add --no-cache curl
+ - curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
+ - install -m 0755 kubectl /usr/local/bin/kubectl
+ - mkdir -p ~/.kube && echo "$KUBE_CONFIG" | base64 -d > ~/.kube/config
+ script:
+ - helm repo add spiceai https://helm.spiceai.org
+ - helm repo update
+ - helm upgrade --install spiceai spiceai/spiceai
+ --namespace spiceai --create-namespace
+ --values values.yaml
+ --atomic --wait --timeout 5m
+ only:
+ - main
+```
+
+### Pinning the chart and runtime versions
+
+Production pipelines should pin both the chart and the Spice runtime image to specific versions. Pass `--version` to `helm upgrade` to pin the chart, and set `image.tag` in `values.yaml` to pin the runtime image:
+
+```bash
+helm upgrade --install spiceai spiceai/spiceai \
+ --version 1.10.0 \
+ --values values.yaml
+```
+
+Available chart versions are listed in the [helm-charts repository](https://github.com/spiceai/helm-charts/releases). Runtime image tags are published on [GitHub Container Registry](https://github.com/spiceai/spiceai/pkgs/container/spiceai).
+
+### Promoting across environments
+
+To promote the same artifact across environments, keep a base `values.yaml` and add per-environment overlays such as `values.staging.yaml` and `values.prod.yaml`. Helm merges multiple `-f` flags in order:
+
+```bash
+helm upgrade --install spiceai spiceai/spiceai \
+ --values values.yaml \
+ --values values.prod.yaml
+```
+
+Each environment can target a different cluster, namespace, or image tag while sharing the same Spicepod definition.
+
+## Kubernetes GitOps
+
+GitOps controllers reconcile cluster state from a Git repository, removing the need for the pipeline to hold cluster credentials. The controller runs inside the cluster and pulls changes as they are committed.
+
+- [Argo CD](https://spiceai.org/docs/deployment/kubernetes/argocd) — `Application` manifests reconciled by the Argo CD controller.
+- [Flux](https://spiceai.org/docs/deployment/kubernetes/flux) — `HelmRelease` resources reconciled by the Flux toolkit.
+
+Both guides include end-to-end manifests targeting the official chart, including upgrade and rollback patterns. GitOps is the recommended approach for multi-cluster or multi-environment deployments.
+
+## Containers and cloud VMs
+
+For deployments that target a container runtime or a cloud VM rather than Kubernetes, invoke the standard provider tooling from any pipeline runner:
+
+- [Docker](https://spiceai.org/docs/deployment/docker) — build, push, and run the `spiceai/spiceai` image. Pipelines typically run `docker build` and `docker push` against a registry, then `docker compose up -d` or `docker run` on the target host.
+- [AWS](https://spiceai.org/docs/deployment/aws) — deploy the published CloudFormation template through the AWS CLI or any CloudFormation-aware action.
+- [Azure](https://spiceai.org/docs/deployment/azure) — deploy through ARM/Bicep templates or the Azure CLI.
+
+Each provider guide includes the deployment artifact (image, template, or script) that the pipeline invokes.
+
+## Spice Cloud Platform
+
+Deployments targeting the [Spice Cloud Platform](https://spiceai.org/docs/deployment/cloud) can be automated two ways:
+
+- **Connect Repository** — link a GitHub repository to a Spice Cloud app from the portal. The app redeploys automatically on each push to the connected branch, with no pipeline configuration required. See [Connect GitHub](https://docs.spice.ai/docs/portal/apps/connect-github).
+- **GitHub Actions** — use the [`spicehq/spice-cloud-deploy-action`](https://github.com/spicehq/spice-cloud-deploy-action) to deploy from a custom workflow. Use this when the pipeline needs to run tests, build artifacts, or set secrets and tags before deploying.
+
+### GitHub Actions
+
+The `spicehq/spice-cloud-deploy-action` deploys a Spicepod manifest to a Spice Cloud app on each pipeline run.
+
+#### Prerequisites
+
+- A [Spice Cloud account](https://spice.ai/login).
+- An OAuth client created from the Spice Cloud Portal. Two repository secrets — `SPICE_CLIENT_ID` and `SPICE_CLIENT_SECRET` — store its credentials.
+- A `spicepod.yaml` checked into the repository.
+
+#### Minimal workflow
+
+```yaml
+name: Deploy Spicepod
+on:
+ push:
+ branches: [main]
+
+jobs:
+ deploy:
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout@v4
+ - uses: spicehq/spice-cloud-deploy-action@v1
+ with:
+ client-id: ${{ secrets.SPICE_CLIENT_ID }}
+ client-secret: ${{ secrets.SPICE_CLIENT_SECRET }}
+ app-name: my-app
+ spicepod: spicepod.yaml
+```
+
+#### Common options
+
+| Input | Purpose |
+| -------------------------------------- | -------------------------------------------------------------------------------------- |
+| `app-name` or `app-id` | Target Spice Cloud app. One is required. |
+| `spicepod` | Path to the Spicepod manifest. Defaults to `spicepod.yaml`. |
+| `region` | Required when `create-app-if-missing` provisions a new app (for example, `us-east-1`). |
+| `create-app-if-missing` | Boolean. Creates the app on first deploy. |
+| `secrets` | YAML or JSON map of app-level secrets to set on the deployment. |
+| `tags` | YAML or JSON map of metadata labels. |
+| `test-sql`, `test-chat`, `test-search` | Post-deploy smoke checks against the deployed app. |
+| `wait-for-completion` | Poll until the deployment finishes. Defaults to `true`. |
+| `timeout-seconds` | Maximum time to wait when polling. Defaults to `600`. |
+
+The action emits `app-id`, `app-url`, `deployment-id`, `deployment-status`, and `test-results` outputs that downstream steps can consume. For the full input and output reference, see the [action's README](https://github.com/spicehq/spice-cloud-deploy-action).
+
+## Related
+
+- [Helm Deployment Guide](https://spiceai.org/docs/deployment/kubernetes/helm)
+- [Kubernetes Deployment Guide](https://spiceai.org/docs/deployment/kubernetes)
+- [Argo CD Guide](https://spiceai.org/docs/deployment/kubernetes/argocd)
+- [Flux Guide](https://spiceai.org/docs/deployment/kubernetes/flux)
+- [Spice Cloud Platform Deployment](https://spiceai.org/docs/deployment/cloud)
+- [Spice Helm Chart](https://github.com/spiceai/helm-charts)
+- [Spice Cloud Deploy Action](https://github.com/spicehq/spice-cloud-deploy-action)
diff --git a/website/docs/deployment/cloud/index.md b/website/docs/deployment/cloud/index.md
index 820e4d806..ef78a8b41 100644
--- a/website/docs/deployment/cloud/index.md
+++ b/website/docs/deployment/cloud/index.md
@@ -2,7 +2,7 @@
title: 'Spice Cloud Platform Deployment'
description: 'Guide to deploying data and AI applications using the managed Spice Cloud Platform'
sidebar_label: 'Spice Cloud Platform'
-sidebar_position: 5
+sidebar_position: 6
---
The Spice Cloud Platform is a managed, cloud-hosted solution designed for deploying data and AI applications and agents. It provides a secure and efficient compute environment powered by Spice.ai OSS, offering building blocks including high-speed SQL queries, LLM inference, vector search, and retrieval-augmented generation (RAG).
diff --git a/website/docs/deployment/docker/index.md b/website/docs/deployment/docker/index.md
index 20bdcf780..092cd5521 100644
--- a/website/docs/deployment/docker/index.md
+++ b/website/docs/deployment/docker/index.md
@@ -2,7 +2,7 @@
title: 'Docker'
description: 'Run Spice.ai as a Docker container.'
sidebar_label: 'Docker'
-sidebar_position: 3
+sidebar_position: 4
tags:
- deployment
- docker
diff --git a/website/docs/deployment/gcp/index.md b/website/docs/deployment/gcp/index.md
new file mode 100644
index 000000000..0e333cf8d
--- /dev/null
+++ b/website/docs/deployment/gcp/index.md
@@ -0,0 +1,242 @@
+---
+title: 'Google Cloud Deployment Options'
+description: 'Guide to deploying Spice.ai applications on Google Cloud Platform (GCP).'
+sidebar_label: 'GCP'
+sidebar_position: 3
+pagination_next: null
+keywords:
+ [
+ spice.ai,
+ gcp,
+ google cloud,
+ gke,
+ cloud run,
+ compute engine,
+ workload identity,
+ bigquery,
+ cloud sql,
+ alloydb,
+ terraform
+ ]
+---
+
+Spice.ai runs on Google Cloud Platform (GCP) on Kubernetes, serverless containers, or virtual machines. The container image and Helm chart are the same artefacts used in every other environment, so the choice of GCP service is a matter of operational fit rather than packaging.
+
+For a complete list of GCP-compatible data connectors, AI models, and supported services, see [GCP Integrations](gcp/integrations).
+
+## Benefits of deploying on GCP
+
+- **Scalability**: Scale Spice with [GKE node auto-provisioning](https://cloud.google.com/kubernetes-engine/docs/concepts/node-auto-provisioning), [GKE Autopilot](https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview), and [Cloud Run](https://cloud.google.com/run).
+- **Global reach**: Deploy across [GCP regions](https://cloud.google.com/about/locations) for low-latency access close to data sources.
+- **Integration**: Connect to [BigQuery](https://cloud.google.com/bigquery), [Cloud Storage](https://cloud.google.com/storage), [Cloud SQL](https://cloud.google.com/sql), [AlloyDB](https://cloud.google.com/alloydb), and [Secret Manager](https://cloud.google.com/secret-manager).
+- **Cost control**: Choose from [machine types](https://cloud.google.com/compute/docs/machine-resource), [committed use discounts](https://cloud.google.com/compute/docs/instances/signing-up-committed-use-discounts), and [Spot VMs](https://cloud.google.com/compute/docs/instances/spot).
+- **Security**: Run inside a VPC with [Private Google Access](https://cloud.google.com/vpc/docs/private-google-access), [VPC Service Controls](https://cloud.google.com/vpc-service-controls), and short-lived credentials via [Workload Identity Federation](https://cloud.google.com/iam/docs/workload-identity-federation).
+
+## Deployment options
+
+### Google Kubernetes Engine (GKE)
+
+Run Spice on [GKE](https://cloud.google.com/kubernetes-engine) when the workload benefits from Kubernetes orchestration, multi-replica scale, or shared cluster tenancy. GKE pairs with the [Spice Helm chart](https://spiceai.org/docs/deployment/kubernetes/helm) and the [Argo CD](https://spiceai.org/docs/deployment/kubernetes/argocd) or [Flux](https://spiceai.org/docs/deployment/kubernetes/flux) GitOps workflows.
+
+#### 1. Provision the cluster
+
+The fastest path is `gcloud`. The example below creates a regional Standard cluster with [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/concepts/workload-identity) enabled — required for federated credentials to GCP services.
+
+```bash
+PROJECT=my-project
+REGION=us-central1
+CLUSTER=spiceai-prod
+
+gcloud container clusters create $CLUSTER \
+ --project $PROJECT \
+ --region $REGION \
+ --release-channel regular \
+ --machine-type e2-standard-4 \
+ --num-nodes 1 \
+ --enable-autoscaling --min-nodes 2 --max-nodes 6 \
+ --workload-pool ${PROJECT}.svc.id.goog \
+ --enable-ip-alias
+
+gcloud container clusters get-credentials $CLUSTER --region $REGION --project $PROJECT
+```
+
+For burst or low-utilization workloads, use [GKE Autopilot](https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview) — Google manages the nodes, billing is per-pod, and Workload Identity is enabled by default. For production, prefer Terraform for repeatable provisioning. The [`terraform-google-modules/kubernetes-engine`](https://github.com/terraform-google-modules/terraform-google-kubernetes-engine) module is a common starting point.
+
+#### 2. Configure Workload Identity for GCP access
+
+Most Spice connectors (Cloud Storage via the [S3 connector](../components/data-connectors/s3) with HMAC, BigQuery via [ADBC](../components/data-connectors/adbc), Cloud SQL via [PostgreSQL](../components/data-connectors/postgres) or [MySQL](../components/data-connectors/mysql)) accept GCP credentials from [Application Default Credentials](https://cloud.google.com/docs/authentication/application-default-credentials). Use Workload Identity so pods receive scoped, short-lived tokens without static keys:
+
+```bash
+# 1. Create a Google service account and grant it the roles the Spicepod needs
+gcloud iam service-accounts create spiceai-runtime --project $PROJECT
+
+gcloud projects add-iam-policy-binding $PROJECT \
+ --member "serviceAccount:spiceai-runtime@${PROJECT}.iam.gserviceaccount.com" \
+ --role roles/storage.objectViewer
+
+gcloud projects add-iam-policy-binding $PROJECT \
+ --member "serviceAccount:spiceai-runtime@${PROJECT}.iam.gserviceaccount.com" \
+ --role roles/bigquery.dataViewer
+
+# 2. Bind the Google service account to a Kubernetes ServiceAccount
+gcloud iam service-accounts add-iam-policy-binding \
+ spiceai-runtime@${PROJECT}.iam.gserviceaccount.com \
+ --role roles/iam.workloadIdentityUser \
+ --member "serviceAccount:${PROJECT}.svc.id.goog[spiceai/spiceai]"
+```
+
+Reference the service account from the Helm release so pods inherit federated tokens via the standard ADC chain:
+
+```yaml
+# values.yaml
+serviceAccount:
+ create: true
+ name: spiceai
+ annotations:
+ iam.gke.io/gcp-service-account: spiceai-runtime@my-project.iam.gserviceaccount.com
+```
+
+#### 3. Install Spice.ai
+
+```bash
+helm repo add spiceai https://helm.spiceai.org
+helm repo update
+
+helm upgrade --install spiceai spiceai/spiceai \
+ --namespace spiceai --create-namespace \
+ --version 1.11.5 \
+ -f values.yaml
+```
+
+For declarative GitOps, swap this command for an Argo CD `Application` or a Flux `HelmRelease` pointing at the same chart. See the [Argo CD](https://spiceai.org/docs/deployment/kubernetes/argocd) or [Flux](https://spiceai.org/docs/deployment/kubernetes/flux) guides for full manifests.
+
+#### 4. Storage and ingress
+
+For stateful acceleration (DuckDB, SQLite, Cayenne):
+
+- **Local SSD (recommended)** — Spice acceleration is latency- and IOPS-sensitive, so the lowest-latency option is a node-local NVMe SSD on a machine type with attached [Local SSD](https://cloud.google.com/compute/docs/disks/local-ssd) (`n2-standard-*-lssd`, `c3-standard-*-lssd`, `z3` series). Expose Local SSDs through GKE's [Local SSD raw block / ephemeral storage](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/local-ssd) provisioner. Local SSDs do not survive node replacement, so pair with a refresh strategy or a re-hydration source.
+- **Hyperdisk Extreme / Balanced** — when shared, replica-attachable persistence is required, [Hyperdisk](https://cloud.google.com/compute/docs/disks/hyperdisk) provides high IOPS and configurable throughput. Use the [Compute Engine persistent disk CSI driver](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver) with a custom StorageClass (`type: hyperdisk-balanced` or `hyperdisk-extreme`).
+- **Persistent Disk SSD (`pd-ssd`, `premium-rwo`)** — use the built-in `premium-rwo` storage class only when Hyperdisk is unavailable in a region.
+- **Filestore (`filestore-csi`) — not recommended for acceleration** — use only for stateless shared artefacts that need `ReadWriteMany`. NFS latency negates the benefit of using a local accelerator.
+- Set `stateful.enabled: true` and `stateful.storageClass: ` in `values.yaml`.
+
+:::tip[Spice.ai Enterprise]
+For production stateful workloads, the [Spice.ai Enterprise](https://spice.ai) Operator's [`SpicepodSet`](https://docs.spice.ai/docs/enterprise/kubernetes-operator/spicepodset) provides per-replica `StatefulSet`s with automatic PVC resizing, Workload-Identity-aware ServiceAccount annotations, and configurable update strategies. For distributed query execution across scheduler/executor tiers backed by Cloud Storage, see [`SpicepodCluster`](https://docs.spice.ai/docs/enterprise/kubernetes-operator/spicepodcluster).
+:::
+
+To expose Spice externally, install the [GKE Gateway controller](https://cloud.google.com/kubernetes-engine/docs/concepts/gateway-api) or use a [Cloud Load Balancer Service](https://cloud.google.com/kubernetes-engine/docs/how-to/exposing-apps):
+
+```yaml
+# values.yaml
+service:
+ type: LoadBalancer
+ additionalAnnotations:
+ networking.gke.io/load-balancer-type: 'Internal' # internal only
+```
+
+For internal-only deployments, set `Internal` to bind to the cluster's VPC rather than a public IP.
+
+#### 5. Observability
+
+The Spice Helm chart ships a `PodMonitor` resource for the [Prometheus Operator](https://prometheus-operator.dev/). On GKE, [Google Cloud Managed Service for Prometheus](https://cloud.google.com/stackdriver/docs/managed-prometheus) is the common target — it ingests `PodMonitor` resources directly when [managed collection](https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed) is enabled. Set `monitoring.podMonitor.enabled: true` and import the [Spice Grafana dashboard](../monitoring/grafana) into [Cloud Monitoring](https://cloud.google.com/monitoring) or self-managed Grafana.
+
+For comprehensive guidance, refer to the [GKE documentation](https://cloud.google.com/kubernetes-engine/docs), [GKE security best practices](https://cloud.google.com/kubernetes-engine/docs/concepts/security-overview), and the [Spice.ai Kubernetes Deployment Guide](https://spiceai.org/docs/deployment/kubernetes).
+
+### Cloud Run
+
+[Cloud Run](https://cloud.google.com/run) is a serverless container platform suitable for HTTP-driven Spice.ai workloads that benefit from scale-to-zero and request-based autoscaling. Use it when a single managed container is sufficient and operating Kubernetes is not desired.
+
+#### 1. Configure a service account
+
+Create a service account with the IAM roles the Spicepod requires. Cloud Run attaches it to the service so the runtime authenticates via [Application Default Credentials](https://cloud.google.com/docs/authentication/application-default-credentials) without static keys:
+
+```bash
+gcloud iam service-accounts create spiceai-runtime --project $PROJECT
+
+gcloud projects add-iam-policy-binding $PROJECT \
+ --member "serviceAccount:spiceai-runtime@${PROJECT}.iam.gserviceaccount.com" \
+ --role roles/storage.objectViewer
+
+gcloud projects add-iam-policy-binding $PROJECT \
+ --member "serviceAccount:spiceai-runtime@${PROJECT}.iam.gserviceaccount.com" \
+ --role roles/secretmanager.secretAccessor
+```
+
+#### 2. Deploy Spice.ai
+
+Cloud Run pulls the [Spice.ai container image](https://hub.docker.com/r/spiceai/spiceai) directly. Mount secrets from [Secret Manager](https://cloud.google.com/run/docs/configuring/secrets) and configure HTTP ingress on port `8090`:
+
+```bash
+gcloud run deploy spiceai \
+ --project $PROJECT \
+ --region $REGION \
+ --image spiceai/spiceai:1.11.5-models \
+ --port 8090 \
+ --service-account spiceai-runtime@${PROJECT}.iam.gserviceaccount.com \
+ --min-instances 1 --max-instances 5 \
+ --cpu 1 --memory 2Gi \
+ --set-env-vars SPICED_LOG=INFO \
+ --set-secrets SPICEAI_API_KEY=spiceai-api-key:latest
+```
+
+To run multiple replicas with shared file-based acceleration, mount [Cloud Storage with FUSE](https://cloud.google.com/run/docs/configuring/services/cloud-storage-volume-mounts) and point file accelerators at the mount path (for example, `duckdb_file: /data/taxi_trips.db`). Cloud Storage volume latency is significantly higher than local SSD, so prefer GKE for latency-sensitive accelerated workloads.
+
+#### 3. Scaling rules
+
+Cloud Run scales by concurrent requests per instance ([default 80](https://cloud.google.com/run/docs/about-concurrency)). For background workloads (refresh schedules, ingestion) that should not scale to zero, set `--min-instances 1`. For workloads with long-running connections (Arrow Flight, streaming refresh), set `--no-cpu-throttling` and tune `--concurrency` to match the runtime's request profile.
+
+#### 4. Health probes and revisions
+
+Cloud Run uses [startup and liveness probes](https://cloud.google.com/run/docs/configuring/healthchecks) — point them at `/health` and `/v1/ready`. Each `gcloud run deploy` creates a new [revision](https://cloud.google.com/run/docs/managing/revisions); use [traffic splitting](https://cloud.google.com/run/docs/rollouts-rollbacks-traffic-migration) for canary upgrades:
+
+```bash
+gcloud run services update-traffic spiceai \
+ --region $REGION \
+ --to-revisions spiceai-00010-abc=90,spiceai-00009-xyz=10
+```
+
+For more details, see the [Cloud Run documentation](https://cloud.google.com/run/docs) and the [Spice.ai Docker Deployment Guide](https://spiceai.org/docs/deployment/docker).
+
+### Compute Engine
+
+Deploy Spice directly on [Compute Engine](https://cloud.google.com/compute) for maximum control over the environment, GPU access, or large-memory machine types.
+
+1. **Manual VM deployment**:
+ - Provision a Linux VM (Ubuntu, Debian, or Container-Optimized OS) with an appropriate [machine type](https://cloud.google.com/compute/docs/machine-resource).
+ - Install [Docker Engine](https://docs.docker.com/engine/install/) and run [Spice.ai as a Docker container](https://spiceai.org/docs/deployment/docker), or install the `spice` binary directly. See the [installation guide](https://spiceai.org/docs/installation).
+ - Attach a [service account](https://cloud.google.com/compute/docs/access/service-accounts) so Spice can read from Cloud Storage, BigQuery, and Secret Manager without static credentials.
+
+2. **Automated deployment with Terraform or Deployment Manager**:
+ - Define infrastructure in a [Terraform configuration](https://registry.terraform.io/providers/hashicorp/google/latest), including the VM, network, firewall rules, and service account.
+ - Use [startup scripts](https://cloud.google.com/compute/docs/instances/startup-scripts/linux) or [Container-Optimized OS with `cloud-init`](https://cloud.google.com/container-optimized-os/docs/how-to/run-container-instance) to install Docker, pull the [Spice.ai image](https://hub.docker.com/r/spiceai/spiceai), retrieve secrets from [Secret Manager](https://cloud.google.com/secret-manager), and start the runtime.
+ - Use [managed instance groups](https://cloud.google.com/compute/docs/instance-groups) for horizontally scaled deployments fronted by an [external HTTP(S) load balancer](https://cloud.google.com/load-balancing/docs/https) or [internal load balancer](https://cloud.google.com/load-balancing/docs/internal).
+
+For detailed guidance, refer to the [Compute Engine documentation](https://cloud.google.com/compute/docs), the [Container-Optimized OS guide](https://cloud.google.com/container-optimized-os/docs), and the [Google provider for Terraform](https://registry.terraform.io/providers/hashicorp/google/latest/docs).
+
+## Authentication
+
+Most GCP services that Spice connects to accept explicit credentials through component parameters (for example, `iceberg_gcs_credentials` on the [Iceberg connector](../components/data-connectors/iceberg)). When explicit credentials are not provided, Spice follows the standard [Application Default Credentials](https://cloud.google.com/docs/authentication/application-default-credentials) chain:
+
+1. **`GOOGLE_APPLICATION_CREDENTIALS`** — path to a service account JSON key file. Common in local development; not recommended for production.
+2. **Attached service account** — the credential of the runtime environment:
+ - Compute Engine, Cloud Run, GKE node default service account.
+ - GKE pods configured with [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/concepts/workload-identity) — federated tokens scoped to a namespaced Kubernetes ServiceAccount, with no static keys on the node.
+3. **`gcloud` CLI credentials** — cached credentials from `gcloud auth application-default login`. Common during development.
+4. **Workload Identity Federation** — federated identity for workloads running outside GCP (other clouds, on-premises, GitHub Actions). See [Workload Identity Federation](https://cloud.google.com/iam/docs/workload-identity-federation).
+
+For services with explicit parameters (Cloud Storage HMAC, BigQuery service account JSON), prefer named credentials or Workload Identity over `GOOGLE_APPLICATION_CREDENTIALS` files in production.
+
+:::note[IAM role bindings]
+Regardless of the credential source, the principal must have the appropriate IAM role bindings (for example, `roles/storage.objectViewer` on a bucket, or `roles/bigquery.dataViewer` on a BigQuery dataset). When a Spicepod connects to multiple GCP services, the principal must have permissions across all of them.
+:::
+
+## Resources
+
+### Documentation
+
+- [GCP Integrations](gcp/integrations) — complete list of GCP data connectors, AI models, and supported services.
+- [Spice.ai Kubernetes Deployment Guide](../deployment/kubernetes) — Helm, Argo CD, and Flux options for GKE.
+
+### Google Cloud Marketplace
+
+Spice.ai is not yet published to [Google Cloud Marketplace](https://console.cloud.google.com/marketplace) (coming soon). In the meantime, deploy using the [`spiceai/spiceai`](https://hub.docker.com/r/spiceai/spiceai) container image or the [Spice Helm chart](https://helm.spiceai.org).
diff --git a/website/docs/deployment/gcp/integrations.md b/website/docs/deployment/gcp/integrations.md
new file mode 100644
index 000000000..9f3d4b45e
--- /dev/null
+++ b/website/docs/deployment/gcp/integrations.md
@@ -0,0 +1,168 @@
+---
+title: 'GCP Integrations'
+description: 'Spice.ai integrations with Google Cloud Platform, including data connectors, AI models, embeddings, and authentication.'
+sidebar_label: 'Integrations'
+sidebar_position: 2
+pagination_next: null
+keywords:
+ [
+ spice.ai,
+ gcp,
+ google cloud,
+ bigquery,
+ cloud storage,
+ cloud sql,
+ alloydb,
+ gemini,
+ vertex ai,
+ workload identity
+ ]
+---
+
+Spice.ai integrates with Google Cloud Platform (GCP) for data federation, AI inference, embeddings, and authentication. This page consolidates GCP-compatible components and links to the relevant configuration guides.
+
+## Data Connectors
+
+Data connectors federate SQL queries across GCP data sources without data movement.
+
+| Connector | Description | Documentation |
+| ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ |
+| **BigQuery** (via ADBC) | Query [BigQuery](https://cloud.google.com/bigquery) tables using the [BigQuery ADBC driver](https://docs.adbc-drivers.org/drivers/bigquery/index.html). Includes built-in SQL dialect support for federated queries. | [ADBC Data Connector](../../components/data-connectors/adbc) |
+| **Cloud Storage** (S3-compat) | Query Parquet, CSV, and JSON objects in [Cloud Storage](https://cloud.google.com/storage) using the S3 connector with HMAC keys against the GCS [interoperability endpoint](https://cloud.google.com/storage/docs/aws-simple-migration). | [S3 Data Connector](../../components/data-connectors/s3) |
+| **Cloud SQL for PostgreSQL** | Connect to [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres) directly or through the [Cloud SQL Auth Proxy](https://cloud.google.com/sql/docs/postgres/sql-proxy). | [PostgreSQL Data Connector](../../components/data-connectors/postgres) |
+| **Cloud SQL for MySQL** | Connect to [Cloud SQL for MySQL](https://cloud.google.com/sql/docs/mysql) directly or through the Cloud SQL Auth Proxy. | [MySQL Data Connector](../../components/data-connectors/mysql) |
+| **Cloud SQL for SQL Server** | Connect to [Cloud SQL for SQL Server](https://cloud.google.com/sql/docs/sqlserver). | [MSSQL Data Connector](../../components/data-connectors/mssql) |
+| **AlloyDB for PostgreSQL** | Connect to [AlloyDB](https://cloud.google.com/alloydb) using the PostgreSQL wire protocol. | [PostgreSQL Data Connector](../../components/data-connectors/postgres) |
+| **Apache Iceberg (GCS)** | Query Iceberg tables stored in Cloud Storage with REST or Hive metadata. Native GCS authentication via service account credentials or OAuth tokens. | [Iceberg Data Connector](../../components/data-connectors/iceberg) |
+| **Delta Lake (GCS)** | Query Delta Lake tables stored in Cloud Storage. | [Delta Lake Data Connector](../../components/data-connectors/delta-lake) |
+| **GCP databases via ODBC** | Connect through ODBC drivers for additional GCP-compatible data sources. | [ODBC Data Connector](../../components/data-connectors/odbc) |
+
+### Example: BigQuery via ADBC
+
+```yaml
+datasets:
+ - from: adbc:my_dataset.orders
+ name: orders
+ params:
+ adbc_driver: bigquery
+ adbc_uri: 'bigquery:///my-gcp-project'
+ adbc_driver_options: |
+ adbc.bigquery.sql.dataset_id=my_dataset
+ adbc.bigquery.sql.auth_type=adbc.bigquery.sql.auth_type.json_credential_file
+ adbc.bigquery.sql.auth_credentials=/var/run/secrets/gcp/key.json
+```
+
+When the runtime uses Workload Identity, omit `auth_type` and `auth_credentials` — the BigQuery driver picks up Application Default Credentials automatically.
+
+### Example: Cloud Storage via S3 connector
+
+Cloud Storage exposes an [S3-compatible interoperability endpoint](https://cloud.google.com/storage/docs/aws-simple-migration). Generate an [HMAC key](https://cloud.google.com/storage/docs/authentication/hmackeys) tied to a service account, then point the S3 connector at `storage.googleapis.com`:
+
+```yaml
+datasets:
+ - from: s3://my-bucket/path/to/data/
+ name: events
+ params:
+ file_format: parquet
+ s3_endpoint: https://storage.googleapis.com
+ s3_auth: key
+ s3_key: ${ secrets:GCS_HMAC_ACCESS_ID }
+ s3_secret: ${ secrets:GCS_HMAC_SECRET }
+```
+
+For Iceberg or Delta tables stored in GCS, use the connector's native GCS parameters instead, which support service-account credentials and ADC directly.
+
+### Example: Cloud SQL for PostgreSQL
+
+Run the [Cloud SQL Auth Proxy](https://cloud.google.com/sql/docs/postgres/sql-proxy) as a sidecar (or `127.0.0.1` listener on Compute Engine) and connect over the loopback interface:
+
+```yaml
+datasets:
+ - from: postgres:public.orders
+ name: orders
+ params:
+ pg_host: 127.0.0.1
+ pg_port: '5432'
+ pg_db: app
+ pg_user: ${ secrets:CLOUDSQL_USER }
+ pg_pass: ${ secrets:CLOUDSQL_PASSWORD }
+ pg_sslmode: disable
+```
+
+For [Postgres replication-based CDC](../../features/cdc/postgres-replication), Cloud SQL requires the `cloudsql.logical_decoding = on` flag.
+
+## AI Models (Google AI)
+
+Spice integrates with [Google AI Studio](https://aistudio.google.com) for chat completion and reasoning models, including the Gemini family.
+
+| Provider | Supported Models | Documentation |
+| ------------- | ----------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
+| **Google AI** | Gemini 2.0/2.5/Pro, Gemini Flash, and other models from the [Gemini API](https://ai.google.dev/gemini-api). | [Google AI Models](../../components/models/google) |
+
+### Example: Gemini Chat Model
+
+```yaml
+models:
+ - from: google:gemini-2.0-flash-exp
+ name: gemini
+ params:
+ google_api_key: ${ secrets:GEMINI_API_KEY }
+```
+
+See [Google AI Models](https://ai.google.dev/gemini-api/docs/models/gemini) for the full list of supported model names.
+
+## Embeddings (Google AI)
+
+Generate vector embeddings using Gemini embedding models for semantic search and retrieval-augmented generation (RAG).
+
+| Provider | Supported Models | Documentation |
+| ------------- | --------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
+| **Google AI** | `text-embedding-004` and other models from [Gemini API embeddings](https://ai.google.dev/gemini-api/docs/models/gemini#text-embedding). | [Google AI Embeddings](../../components/embeddings/google) |
+
+### Example: Google AI Embeddings
+
+```yaml
+embeddings:
+ - from: google:text-embedding-004
+ name: gemini_embeddings
+ params:
+ google_api_key: ${ secrets:GEMINI_API_KEY }
+```
+
+## Snapshots and shared state
+
+[Snapshots](../../features/data-acceleration/snapshots) and the [distributed query](../../features/distributed-query) state location can use Cloud Storage as the shared object store. Configure with the `gs://` scheme:
+
+```yaml
+snapshots:
+ location: gs://my-bucket/spiceai/snapshots
+```
+
+When no explicit credentials are supplied, Spice reads `GOOGLE_APPLICATION_CREDENTIALS` and the Workload Identity-federated token, in that order.
+
+## Authentication
+
+All GCP integrations support the standard [Application Default Credentials](https://cloud.google.com/docs/authentication/application-default-credentials) chain. When credentials are not explicitly configured, Spice attempts the following in order:
+
+1. **`GOOGLE_APPLICATION_CREDENTIALS`** — path to a service account JSON key file.
+2. **Attached service account** — Compute Engine, Cloud Run, or GKE node default service account.
+3. **GKE Workload Identity** — federated tokens for pods bound to a Google service account via the Kubernetes ServiceAccount. See [Workload Identity for GKE](https://cloud.google.com/kubernetes-engine/docs/concepts/workload-identity).
+4. **`gcloud` CLI** — cached credentials from `gcloud auth application-default login`.
+5. **Workload Identity Federation** — federated identity for workloads running outside GCP (other clouds, on-premises, GitHub Actions). See [Workload Identity Federation](https://cloud.google.com/iam/docs/workload-identity-federation).
+
+For a deployment-side overview of these mechanisms, see the [Authentication](../gcp#authentication) section of the GCP deployment guide.
+
+### IAM role bindings
+
+Each principal must have the appropriate IAM role for the services it accesses:
+
+| Service | Common role(s) |
+| ------------------------ | ----------------------------------------------------------- |
+| Cloud Storage | `roles/storage.objectViewer` or `roles/storage.objectAdmin` |
+| BigQuery | `roles/bigquery.dataViewer` and `roles/bigquery.jobUser` |
+| Cloud SQL | `roles/cloudsql.client` (proxy) plus database-level grants |
+| Secret Manager | `roles/secretmanager.secretAccessor` |
+| Artifact Registry | `roles/artifactregistry.reader` for image pulls |
+| Cloud Logging/Monitoring | `roles/logging.logWriter`, `roles/monitoring.metricWriter` |
+
+When a Spicepod connects to multiple GCP services, ensure roles are granted on every resource the runtime touches.
diff --git a/website/docs/deployment/index.md b/website/docs/deployment/index.md
index 05b90015c..905c787e2 100644
--- a/website/docs/deployment/index.md
+++ b/website/docs/deployment/index.md
@@ -2,28 +2,62 @@
title: 'Spice.ai Deployment Guide'
sidebar_label: 'Deployment'
description: 'Deploy Spice.ai in your environment using Docker, Kubernetes, AWS, Azure, or the Spice Cloud Platform. Learn about sidecar, microservice, tiered, and cluster deployment architectures.'
-keywords: [spice.ai, deployment, docker, kubernetes, aws, azure, sidecar, microservice, cluster, helm, cloud platform]
+keywords:
+ [
+ spice.ai,
+ deployment,
+ docker,
+ kubernetes,
+ aws,
+ azure,
+ sidecar,
+ microservice,
+ cluster,
+ helm,
+ cloud platform
+ ]
image: /img/og/spiceai.png
sidebar_position: 11
---
-Spice supports flexible deployment options ranging from a single binary to fully managed cloud deployments. Choose the architecture that best fits your application's latency, scale, and operational requirements.
+Spice runs as a single binary, a container, a Kubernetes workload, or a fully managed app on the Spice Cloud Platform. This guide helps choose a target environment and a deployment architecture to match an application's latency, scale, and operational requirements.
-## Deployment Architectures
+## Choose a deployment target
-- [Overview](deployment/architectures)
-- [Sidecar Deployment](deployment/architectures/sidecar)
-- [Microservice Deployment (Single or Multiple Replicas)](deployment/architectures/microservice)
-- [Tiered Deployment](deployment/architectures/tiered)
-- [Hybrid Deployment](deployment/architectures/hybrid)
-- [Cloud-Hosted in the Spice Cloud Platform](deployment/architectures/hosted)
-- [Sharded Deployment](deployment/architectures/sharded)
-- [Cluster Deployment (Spice.ai Enterprise)](deployment/architectures/cluster)
+Most users fall into one of three groups:
-## Deployment Guides
+- **Run Spice next to an application** — start with [Docker](deployment/docker) for a local container, or follow [Getting Started](getting-started) to run the binary directly.
+- **Operate Spice in production on Kubernetes** — use the [Spice Helm chart](deployment/kubernetes/helm). For automated rollouts, see the [CI/CD guide](deployment/ci-cd) for Helm pipelines and GitOps with [Argo CD](deployment/kubernetes/argocd) or [Flux](deployment/kubernetes/flux).
+- **Use a managed service** — deploy a Spicepod to the [Spice Cloud Platform](deployment/cloud) and connect a [GitHub repository](https://docs.spice.ai/docs/portal/apps/connect-github) for continuous delivery.
-- [Kubernetes](deployment/kubernetes) — Helm, Argo CD, and Flux
-- [Docker](deployment/docker)
-- [Spice Cloud](deployment/cloud)
-- [AWS](deployment/aws)
-- [Azure](deployment/azure)
+:::tip Self-hosted enterprise deployments
+For production self-hosted clusters, the [Spice.ai Enterprise Kubernetes Operator](https://docs.spice.ai/docs/enterprise/kubernetes-operator/kubernetes) provides per-replica StatefulSets, automatic PVC resizing, configurable update strategies, crashloop protection, and distributed query execution through `SpicepodSet` and `SpicepodCluster` custom resources.
+:::
+
+## Deployment architectures
+
+Architecture refers to where Spice runs in relation to the application and data sources, and how it scales. Pick an architecture before choosing a guide; the same target environment can host any of these patterns.
+
+- [Overview](deployment/architectures) — when to choose each architecture.
+- [Sidecar](deployment/architectures/sidecar) — Spice runs alongside the application for the lowest latency.
+- [Microservice](deployment/architectures/microservice) — single or multiple replicas behind a load balancer.
+- [Tiered](deployment/architectures/tiered) — separate read and write tiers for mixed workloads.
+- [Cluster-Sidecar](deployment/architectures/cluster-sidecar) — combine local and remote Spice instances.
+- [Hosted](deployment/architectures/hosted) — managed on the Spice Cloud Platform.
+- [Sharded](deployment/architectures/sharded) — partition data across multiple Spice instances.
+- [Cluster](deployment/architectures/cluster) — distributed query execution with Spice.ai Enterprise.
+
+## Deployment guides
+
+Step-by-step instructions for each target environment.
+
+| Guide | When to use |
+| --------------------------------------------------------- | -------------------------------------------------------------------------- |
+| [Kubernetes](deployment/kubernetes) | Self-hosted production deployments. Covers Helm, Argo CD, and Flux. |
+| [Docker](deployment/docker) | Local development, single-host deployments, and container-based pipelines. |
+| [Spice Cloud](deployment/cloud) | Fully managed deployments without operating infrastructure. |
+| [AWS](deployment/aws) | Deployments on AWS using the published CloudFormation template. |
+| [Azure](deployment/azure) | Deployments on Azure using ARM/Bicep templates. |
+| [GCP](deployment/gcp) | Deployments on Google Cloud using GKE, Cloud Run, or Compute Engine. |
+| [CI/CD](deployment/ci-cd) | Automating any of the above through pipelines or GitOps. |
+| [Read/Write Separation](deployment/read-write-separation) | Production pattern that splits ingest from reads using shared snapshots. |
diff --git a/website/docs/deployment/kubernetes/index.md b/website/docs/deployment/kubernetes/index.md
index 39d356776..1c02c8579 100644
--- a/website/docs/deployment/kubernetes/index.md
+++ b/website/docs/deployment/kubernetes/index.md
@@ -1,7 +1,7 @@
---
title: 'Kubernetes Deployment'
sidebar_label: 'Kubernetes'
-sidebar_position: 4
+sidebar_position: 5
description: 'Deploy Spice.ai on Kubernetes using Helm, Argo CD, or Flux.'
tags:
- deployment
diff --git a/website/docs/deployment/read-write-separation.md b/website/docs/deployment/read-write-separation.md
new file mode 100644
index 000000000..fa06c29a1
--- /dev/null
+++ b/website/docs/deployment/read-write-separation.md
@@ -0,0 +1,370 @@
+---
+title: 'Read/Write Separation'
+sidebar_label: 'Read/Write Separation'
+description: 'Separate write/ingest workloads (cluster) from read workloads (application sidecars, agents) using shared snapshots and live query delegation.'
+sidebar_position: 8
+pagination_next: null
+keywords:
+ [
+ spice.ai,
+ deployment,
+ architecture,
+ read write separation,
+ cluster sidecar,
+ snapshots,
+ bootstrap,
+ ingest,
+ cqrs,
+ sidecar,
+ agent
+ ]
+tags:
+ - deployment
+ - kubernetes
+ - helm
+---
+
+Production data and AI applications typically have two very different workloads on the same data:
+
+- **Writes / ingest** — pulling from source systems, normalizing, accelerating, indexing, and refreshing. CPU-, network-, and memory-heavy. Bursty. Doesn't need to be co-located with the application.
+- **Reads** — answering user-facing requests, feeding context to AI agents, serving dashboards. Latency-sensitive. Often horizontally scaled with the application itself.
+
+Running both on the same Spice instance forces a single hardware shape, refresh schedule, and failure domain on workloads that have nothing in common. Read/write separation splits them into two tiers: a centralized **write/ingest cluster** that owns refresh and acceleration, and one or more lightweight **read instances** (typically [sidecars](architectures/sidecar) next to the application) that serve queries from a local materialized copy.
+
+The two tiers communicate through two channels:
+
+1. **Snapshots** in object storage — the cluster periodically writes a compact acceleration file (DuckDB or SQLite) to S3, GCS, or ADLS. Read instances bootstrap from the latest snapshot on startup and (optionally) refresh from snapshots on a schedule. No live network dependency on the cluster.
+2. **Live query delegation** — when a read instance needs data outside its materialized working set (a historical query, a cross-dataset join, a broad search), it transparently delegates to the cluster over Arrow Flight. See [Cluster-Sidecar Architecture](architectures/cluster-sidecar).
+
+Most production deployments use both: snapshots for the steady-state working set, and live delegation for the long tail.
+
+```mermaid
+flowchart LR
+ subgraph Sources["Data Sources"]
+ S3[("S3 / Iceberg")]
+ PG[("PostgreSQL")]
+ DB[("Databricks")]
+ end
+
+ subgraph Cluster["Write/Ingest Cluster"]
+ direction TB
+ I1["Spice Node 1"]
+ I2["Spice Node 2"]
+ I3["Spice Node 3"]
+ end
+
+ Snapshots[("Object Storage
(snapshots)")]
+
+ subgraph App1["App Pod"]
+ A1["Application"] <-->|"loopback"| R1["Spice Read Instance"]
+ end
+ subgraph App2["App Pod"]
+ A2["Agent"] <-->|"loopback"| R2["Spice Read Instance"]
+ end
+ subgraph App3["App Pod"]
+ A3["Service"] <-->|"loopback"| R3["Spice Read Instance"]
+ end
+
+ Sources --> Cluster
+ Cluster -->|"write snapshots"| Snapshots
+ Snapshots -->|"bootstrap / refresh"| R1 & R2 & R3
+ R1 & R2 & R3 -.->|"live delegation
(Arrow Flight)"| Cluster
+```
+
+## When to use read/write separation
+
+Use this pattern when:
+
+- Application instances need **sub-millisecond reads** but data refresh, ingestion, or acceleration would saturate them.
+- The same datasets are read by **many replicas**, each currently re-ingesting from source systems.
+- Read instances need to **start fast** — autoscaling, scale-to-many agent containers, or ephemeral Cloud Run / Knative workloads where cold-starting from source is too slow.
+- Upstream data sources have **rate or cost limits** that prevent every replica from connecting directly.
+- Read instances run **outside the cluster's network** — at the edge, in another VPC, on a developer laptop — and cannot maintain a permanent dependency on the source system.
+
+It is overkill when one Spice instance is sufficient (start with [Sidecar](architectures/sidecar)) or when the workload is purely batch/analytical with relaxed latency (use [Microservice](architectures/microservice)).
+
+## How it works
+
+### The cluster (write tier)
+
+The cluster owns every refresh, acceleration, and search index for the datasets in scope. It runs as a standalone Spice deployment — typically a Kubernetes [`Deployment`](./kubernetes/helm) or [`StatefulSet`](https://docs.spice.ai/docs/enterprise/kubernetes-operator/spicepodset), or a managed [Spice Cloud](./cloud) app — and holds the only credentials to the source systems.
+
+Cluster Spicepod responsibilities:
+
+- Connect to every source: object stores, OLTP databases, lakehouses, search indices, message queues.
+- Run all refresh schedules, CDC, and stream ingest.
+- Accelerate to file-mode engines (DuckDB or SQLite) so the materialization can be exported as a snapshot.
+- Write [snapshots](../features/data-acceleration/snapshots) to a shared object store after each refresh.
+
+```yaml
+# cluster spicepod.yaml
+snapshots:
+ enabled: true
+ location: s3://spiceai-snapshots/prod/
+ params:
+ s3_auth: iam_role
+
+datasets:
+ - from: s3://my-lake/orders/
+ name: orders
+ params:
+ file_format: parquet
+ acceleration:
+ enabled: true
+ engine: duckdb
+ mode: file
+ refresh_check_interval: 5m
+ snapshots: enabled # write a new snapshot after every refresh
+ snapshots_trigger: refresh_complete
+ snapshots_compaction: enabled
+ params:
+ duckdb_file: /data/orders.db
+
+ - from: postgres:public.customers
+ name: customers
+ params:
+ pg_host: postgres.internal
+ pg_user: ${ secrets:PG_USER }
+ pg_pass: ${ secrets:PG_PASS }
+ acceleration:
+ enabled: true
+ engine: duckdb
+ mode: file
+ refresh_mode: changes # CDC
+ snapshots: enabled
+ snapshots_trigger: time_interval
+ snapshots_trigger_threshold: 10m
+ params:
+ duckdb_file: /data/customers.db
+```
+
+Snapshots are partitioned by date and dataset (`month=YYYY-MM/day=YYYY-MM-DD/dataset=/...`), so retention is a normal object-store lifecycle rule. See [Snapshots](../features/data-acceleration/snapshots) for the full configuration reference.
+
+### The read instances (read tier)
+
+Read instances run alongside applications — typically as Kubernetes pod sidecars, but the same configuration works in Cloud Run, on bare metal, or on a developer laptop. They never connect to source systems; their only inbound dependencies are the snapshot bucket and (optionally) the cluster's Arrow Flight endpoint.
+
+Read Spicepod responsibilities:
+
+- Bootstrap each accelerated dataset from the latest snapshot on startup. No source connection required.
+- Optionally refresh from newer snapshots on a schedule (`bootstrap_only` mode polls for new snapshots without writing them).
+- Optionally delegate queries that fall outside the materialized working set to the cluster over Arrow Flight.
+
+```yaml
+# read instance spicepod.yaml
+snapshots:
+ enabled: true
+ location: s3://spiceai-snapshots/prod/
+ bootstrap_on_failure_behavior: fallback # try older snapshots if the newest fails
+ params:
+ s3_auth: iam_role
+
+datasets:
+ - from: s3://my-lake/orders/ # same source URL, but never used at runtime
+ name: orders
+ params:
+ file_format: parquet
+ acceleration:
+ enabled: true
+ engine: duckdb
+ mode: file
+ snapshots: bootstrap_only # download only; never write back
+ params:
+ duckdb_file: /local/orders.db
+
+ - from: postgres:public.customers
+ name: customers
+ acceleration:
+ enabled: true
+ engine: duckdb
+ mode: file
+ snapshots: bootstrap_only
+ params:
+ duckdb_file: /local/customers.db
+```
+
+`snapshots: bootstrap_only` is the key setting — read instances **read** snapshots but never **write** them, so multiple replicas don't race to upload. Combine with a periodic refresh trigger to pick up new snapshots without re-querying the source.
+
+### Live delegation for the long tail
+
+Snapshots cover the working set. For queries that span beyond it — historical analytics, cross-dataset joins, distributed search — read instances delegate to the cluster using a [`spiceai` connector](../components/data-connectors/spiceai) entry pointing at the cluster's Arrow Flight endpoint.
+
+```yaml
+# read instance spicepod.yaml (continued)
+datasets:
+ - from: spiceai:orders_history
+ name: orders_history
+ params:
+ endpoint: grpcs://cluster.spice.svc.cluster.local:50051
+ api_key: ${ secrets:CLUSTER_API_KEY }
+```
+
+The application sees a single SQL surface — accelerated tables and delegated tables compose normally in joins and CTEs. See [Cluster-Sidecar Architecture](architectures/cluster-sidecar) for the conceptual model.
+
+## Operational model
+
+### Bootstrap and refresh on read instances
+
+When a read instance starts:
+
+1. For each accelerated dataset, Spice checks for the local file (`duckdb_file` / `sqlite_file`).
+2. If absent and snapshots are enabled, Spice lists the snapshot prefix, downloads the newest snapshot for that dataset, and the dataset goes ready immediately.
+3. If no snapshot is found, behavior is governed by `bootstrap_on_failure_behavior`:
+ - `warn` (default) — boot empty and refresh from the source. Avoid in read-tier instances that should not have source access.
+ - `fallback` — try older snapshots until one loads.
+ - `retry` — keep retrying the newest snapshot.
+
+For zero-source-credentials read instances, set `bootstrap_on_failure_behavior: fallback` or `retry` and ensure the dataset is **never** configured with usable source credentials.
+
+Steady-state refresh on read instances is configured per dataset:
+
+```yaml
+acceleration:
+ refresh_check_interval: 1m # check the snapshot bucket every minute
+ snapshots: bootstrap_only
+```
+
+When a newer snapshot is available, the dataset hot-swaps without restarting the pod.
+
+### Snapshot retention and storage
+
+Snapshots are written to Hive-partitioned paths so retention is straightforward:
+
+```text
+s3://spiceai-snapshots/prod/
+ month=2026-05/day=2026-05-01/dataset=orders/orders_20260501T120000Z.db
+ month=2026-05/day=2026-05-02/dataset=orders/orders_20260502T120000Z.db
+```
+
+Apply an object-store lifecycle rule (S3 lifecycle, GCS Object Lifecycle Management, ADLS Lifecycle) to expire old partitions. Most deployments keep 24–72 hours of refresh-triggered snapshots and a daily archive beyond that.
+
+The snapshot bucket is the only shared dependency between the tiers, so keep it in the same region as the read instances and apply VPC endpoints / Private Google Access to keep traffic on the private network.
+
+### Versioning the Spicepod
+
+The cluster and the read instances share dataset _names_ but not full Spicepods. Two patterns work well:
+
+- **Fork two Spicepods from a common base.** Keep `datasets:` definitions in a shared file and merge the cluster-only and read-only fields at deploy time (Helm value overlays, Kustomize, Jsonnet).
+- **Single Spicepod, role-based behavior.** Use [Spicepod includes](https://docs.spice.ai/docs/reference/spicepod/dependencies) and environment-specific values to switch `snapshots: enabled` (cluster) vs `snapshots: bootstrap_only` (reads) per role.
+
+Whichever approach is chosen, treat schema changes as backward-compatible by default — read instances may be running snapshots from a previous cluster version during a rollout.
+
+## Deploy on Kubernetes
+
+The reference topology runs the cluster as a `StatefulSet` (or [`SpicepodSet`](https://docs.spice.ai/docs/enterprise/kubernetes-operator/spicepodset) on Spice.ai Enterprise) and the read instances as sidecars in application pods. Both use the same [Spice Helm chart](./kubernetes/helm).
+
+### Cluster release
+
+```yaml
+# cluster-values.yaml
+replicaCount: 3
+stateful:
+ enabled: true
+ storageClass: gp3 # or hyperdisk-balanced, managed-csi-premium
+ size: 100Gi
+
+serviceAccount:
+ create: true
+ name: spiceai-cluster
+ annotations:
+ eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/SpiceAIClusterRole
+
+spicepod:
+ # full Spicepod with sources, refresh schedules, and snapshots: enabled
+ ...
+```
+
+```bash
+helm upgrade --install spiceai-cluster spiceai/spiceai \
+ -n spiceai-cluster --create-namespace \
+ -f cluster-values.yaml
+```
+
+### Read instance sidecars
+
+Read instances are deployed as a sidecar container in application pods, configured via a `ConfigMap` that holds the read-tier Spicepod. The application points at `127.0.0.1:8090` (HTTP) or `127.0.0.1:50051` (Arrow Flight) — no service discovery needed.
+
+```yaml
+# application Deployment
+spec:
+ template:
+ spec:
+ serviceAccountName: spiceai-read # IRSA / Workload Identity for snapshot bucket
+ volumes:
+ - name: spicepod
+ configMap:
+ name: spiceai-read-spicepod
+ - name: accel
+ emptyDir: {} # ephemeral; bootstrapped from snapshots
+ containers:
+ - name: app
+ image: my-app:1.2.3
+ env:
+ - name: SPICEAI_HTTP_URL
+ value: http://127.0.0.1:8090
+
+ - name: spiceai
+ image: spiceai/spiceai:1.11.5
+ args: ['--http', '0.0.0.0:8090', '--flight', '0.0.0.0:50051']
+ volumeMounts:
+ - name: spicepod
+ mountPath: /spicepod
+ readOnly: true
+ - name: accel
+ mountPath: /local
+ readinessProbe:
+ httpGet: { path: /v1/ready, port: 8090 }
+ livenessProbe:
+ httpGet: { path: /health, port: 8090 }
+```
+
+The read sidecar's `ServiceAccount` only needs read access to the snapshot bucket. It should **not** be granted source-system credentials — that's what makes the read tier safe to scale to many replicas.
+
+### Spice.ai Enterprise
+
+For production, the [Spice.ai Enterprise Kubernetes Operator](https://docs.spice.ai/docs/enterprise/kubernetes-operator/kubernetes) manages both tiers as custom resources:
+
+- [`SpicepodSet`](https://docs.spice.ai/docs/enterprise/kubernetes-operator/spicepodset) — per-replica `StatefulSet`s for the cluster, with automatic PVC resizing, configurable update strategies, and crashloop protection.
+- [`SpicepodCluster`](https://docs.spice.ai/docs/enterprise/kubernetes-operator/spicepodcluster) — distributed scheduler/executor tiers when the cluster itself is large enough to need its own internal split.
+- Sidecar injection via webhook, so application teams add a single annotation to opt in.
+
+## Capacity sizing
+
+Rough first-pass sizing rules:
+
+| Tier | Typical shape |
+| ---------------- | ---------------------------------------------------------------------------------------------------- |
+| Cluster (writer) | 3+ replicas. Memory sized for the largest accelerated dataset. Network bandwidth for source ingest. |
+| Read instance | 1 replica per application pod. 0.5–2 vCPU, 512Mi–4Gi memory, 10–50Gi local SSD. |
+| Snapshot bucket | Standard tier, same region. Lifecycle rule sized to refresh frequency × number of datasets × 24–72h. |
+
+Read-tier memory is dominated by the working set of the file-mode acceleration engine. DuckDB compaction (`snapshots_compaction: enabled`) typically reduces snapshot size by 30–60%.
+
+## Security model
+
+The split simplifies the credential surface area:
+
+- **Cluster** — holds source credentials, snapshot **write** credentials, and cluster-internal mTLS. Runs in a private subnet; no public ingress.
+- **Read instances** — hold snapshot **read** credentials and a per-instance Arrow Flight token to the cluster (for live delegation). No source credentials.
+- **Application** — talks to its sidecar over loopback. No outbound credentials at all.
+
+Compromising a read instance grants the attacker the read tier's snapshot bucket and the delegated query surface — never the source systems.
+
+## Observability
+
+Both tiers expose the same metrics and tracing endpoints. Practical splits:
+
+- **Cluster dashboards** — refresh duration, snapshot upload size and latency, source connector errors, ingest queue depth.
+- **Read dashboards** — bootstrap duration, snapshot age (write-time vs current-time), query latency p50/p99, delegation rate (queries served locally vs forwarded to the cluster).
+
+A high delegation rate is a signal to expand the materialized working set. A growing snapshot age is a signal that the cluster is falling behind on refresh.
+
+## Related
+
+- [Cluster-Sidecar Architecture](architectures/cluster-sidecar) — the conceptual model and live-delegation pattern.
+- [Snapshots](../features/data-acceleration/snapshots) — full reference for snapshot configuration, triggers, and modes.
+- [Sidecar Architecture](architectures/sidecar) — single-instance precursor to this pattern.
+- [Cluster Architecture](architectures/cluster) — internal scheduler/executor split for the cluster tier (Spice.ai Enterprise).
+- [Kubernetes Deployment Guide](./kubernetes) — Helm, Argo CD, and Flux options for the cluster.
+- [CI/CD](./ci-cd) — automating cluster and read-instance rollouts.
+- [Spice.ai Enterprise Kubernetes Operator](https://docs.spice.ai/docs/enterprise/kubernetes-operator/kubernetes) — recommended for production self-hosted deployments.
diff --git a/website/docs/features/data-acceleration/snapshots.md b/website/docs/features/data-acceleration/snapshots.md
index 0f92a4b2b..a80291a25 100644
--- a/website/docs/features/data-acceleration/snapshots.md
+++ b/website/docs/features/data-acceleration/snapshots.md
@@ -61,19 +61,19 @@ Snapshots are controlled with a top-level `snapshots` block in the Spicepod. The
snapshots:
enabled: true
location: s3://some_bucket/some_folder/ # Folder where snapshots are written
- bootstrap_on_failure_behavior: warn # retry | fallback | warn
+ bootstrap_on_failure_behavior: warn # retry | fallback | warn
params:
- s3_auth: iam_role # Defaults to iam_role for snapshots
+ s3_auth: iam_role # Defaults to iam_role for snapshots
```
### Supported storage backends
-| Backend | URL scheme | Environment variables |
-| --- | --- | --- |
-| Amazon S3 | `s3://` | Standard AWS credentials (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, etc.) |
-| Azure ADLS Gen2 | `abfss://`, `abfs://` | `AZURE_STORAGE_ACCOUNT_NAME`, `AZURE_STORAGE_ACCOUNT_KEY`, `AZURE_CLIENT_ID`/`AZURE_TENANT_ID`/`AZURE_FEDERATED_TOKEN_FILE` |
-| Google Cloud Storage | `gs://` | `GOOGLE_APPLICATION_CREDENTIALS`, Workload Identity |
-| Local filesystem | Absolute or relative path | N/A |
+| Backend | URL scheme | Environment variables |
+| -------------------- | ------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
+| Amazon S3 | `s3://` | Standard AWS credentials (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, etc.) |
+| Azure ADLS Gen2 | `abfss://`, `abfs://` | `AZURE_STORAGE_ACCOUNT_NAME`, `AZURE_STORAGE_ACCOUNT_KEY`, `AZURE_CLIENT_ID`/`AZURE_TENANT_ID`/`AZURE_FEDERATED_TOKEN_FILE` |
+| Google Cloud Storage | `gs://` | `GOOGLE_APPLICATION_CREDENTIALS`, Workload Identity |
+| Local filesystem | Absolute or relative path | N/A |
When the location is an S3 bucket, the configuration accepts any [S3 dataset parameters](../../components/data-connectors/s3) under `params`. Azure and GCS locations also accept their respective connector parameters under `params` for explicit credential overrides. When no explicit credentials are supplied, Spice reads standard environment variables for each cloud provider.
@@ -95,13 +95,14 @@ Each dataset opts into snapshotting through the `acceleration.snapshots` field.
- `disabled` – disable snapshot usage for this dataset. (Default.)
Complete configuration:
+
```yaml
acceleration:
- snapshots: enabled | disabled # default: disabled
- snapshots_trigger: # see trigger modes below
- snapshots_trigger_threshold: # threshold for time_interval or stream_batches
- snapshots_compaction: enabled | disabled # default: disabled (DuckDB only)
- snapshots_reset_expiry_on_load: enabled | disabled # default: disabled (DuckDB only with Caching refresh mode)
+ snapshots: enabled | disabled # default: disabled
+ snapshots_trigger: # see trigger modes below
+ snapshots_trigger_threshold: # threshold for time_interval or stream_batches
+ snapshots_compaction: enabled | disabled # default: disabled (DuckDB only)
+ snapshots_reset_expiry_on_load: enabled | disabled # default: disabled (DuckDB only with Caching refresh mode)
```
### Snapshot triggers
@@ -287,6 +288,8 @@ Append-mode accelerations that define a `time_column` wait to report ready until
For the full reference, see [`snapshots` in the Spicepod specification](../../reference/spicepod#snapshots) and [`acceleration.snapshots`](../../reference/spicepod/datasets#accelerationsnapshots).
+For the production deployment pattern that uses snapshots to separate ingest from read workloads, see [Read/Write Separation](../../deployment/read-write-separation).
+
:::warning[Limitations]
- Only datasets are supported for snapshots. Views are not supported.
diff --git a/website/src/partials/deployment/architectures/_cluster-sidecar.mdx b/website/src/partials/deployment/architectures/_cluster-sidecar.mdx
new file mode 100644
index 000000000..e560245f1
--- /dev/null
+++ b/website/src/partials/deployment/architectures/_cluster-sidecar.mdx
@@ -0,0 +1,75 @@
+Modern applications have two fundamentally different data access patterns, and no single deployment model serves both well. Large analytical queries — scanning terabytes of Iceberg data, joining Delta Lake tables, running cross-dataset aggregations — need distributed execution across many nodes. Hot operational queries — serving the working set a microservice actually uses, answering user-facing requests in under 5 milliseconds, feeding fresh context to an AI agent — need data materialized right next to the application, with no network hop.
+
+The [cluster-sidecar architecture](https://spice.ai/blog/cluster-sidecar-architecture) addresses both patterns in a single platform. Application sidecars handle the hot path with locally accelerated data, while a centralized Spice [cluster](./cluster) (or the [Spice Cloud Platform](./hosted)) provides distributed compute for heavy queries, data ingestion, acceleration, and refresh. When a sidecar needs to reach beyond its materialized working set — a historical query, a cross-dataset join, a broad search — it transparently delegates to the cluster, which executes the query and returns results. The sidecar can then cache those results for future use.
+
+From the application's perspective, everything is localhost. From an infrastructure perspective, the system delivers the throughput of a distributed query engine and the latency of an embedded database — without ETL between them, sync jobs, or consistency gaps.
+
+Think of it as a CDN for your data: the cluster is the origin server, the sidecars are the edge nodes, and Spice handles the caching, invalidation, and routing.
+
+```mermaid
+flowchart LR
+ subgraph Node1["Node / Pod"]
+ direction LR
+ A1["App"] <-->|"loopback"| SC1["Spice Sidecar (cache)"]
+ end
+
+ subgraph Node2["Node / Pod"]
+ direction LR
+ A2["App"] <-->|"loopback"| SC2["Spice Sidecar (cache)"]
+ end
+
+ SC1 & SC2 -->|"Arrow Flight (gRPC)"| Cluster
+
+ subgraph Cluster["Spice Cluster or Spice Cloud"]
+ direction LR
+ S1["Spice Node 1"]
+ S2["Spice Node 2"]
+ S3["Spice Node 3"]
+ end
+
+ Cluster --> Sources["Data Sources (S3, PostgreSQL, Databricks, ...)"]
+```
+
+Each sidecar is configured declaratively via a `spicepod.yaml` — the datasets, views, acceleration engines, search indices, and AI models it manages. Sidecars start in seconds, consume minimal resources, and scale horizontally with application pods: scale a deployment from 5 to 50 replicas and 50 sidecars come up automatically, each caching and materializing the right data.
+
+**Benefits**
+
+- **Kubernetes-native** — designed to run on Kubernetes, leveraging pod-level sidecars with cluster-level orchestration.
+- Sub-millisecond reads via sidecar caching on loopback, with centralized data management in the cluster.
+- Transparent query delegation — sidecars automatically route queries beyond their cached working set to the cluster.
+- Sidecars remain lightweight — only caching, no ingestion or acceleration overhead.
+- Cluster (or Spice Cloud) handles complex operations: data ingestion, [Spice Cayenne](../../components/data-accelerators/cayenne) acceleration, distributed query, hybrid search, and refresh from sources.
+- Works with both self-managed Spice clusters and the managed [Spice Cloud Platform](./hosted) as the centralized backend. The Spice Cloud cluster-sidecar model is the most common production topology.
+- Sidecars can run anywhere — in your VPC, on-premises, at the edge, or in any Kubernetes cluster — while connecting securely to the managed cluster.
+- Horizontal scalability — add sidecars without increasing load on data sources.
+- Resilience — sidecars serve cached data even if the cluster is temporarily unavailable.
+- Secure by default — mTLS encryption across all sidecar-to-cluster communication, with data encrypted at rest and in transit.
+
+**Considerations**
+
+- More complex deployment structure requiring both sidecar and cluster infrastructure. [Spice Cloud](./hosted) reduces this burden by managing the cluster.
+- Cache coherency — sidecars must be configured with appropriate refresh intervals or TTLs to balance freshness with performance.
+- Requires a Spice cluster deployment or [Spice Cloud Platform](./hosted) subscription ([Spice.ai Enterprise](https://spice.ai/enterprise) for self-managed clustering with SSO, RBAC, and audit logs).
+- Network connectivity between sidecars and the cluster must be reliable for cache refreshes and query delegation.
+
+**Use This Approach When**
+
+- Applications require sub-millisecond reads but data ingestion and acceleration should be centralized.
+- Multiple application instances need fast access to the same datasets without each independently querying data sources.
+- Reducing load on upstream data sources is a priority — the cluster ingests once, sidecars cache locally.
+- The system benefits from separating the caching tier (sidecars) from the data processing tier (cluster).
+- Workloads span both real-time operational queries and large-scale analytical queries on the same data (e.g., an operational data lakehouse on S3/Iceberg).
+
+**Not Ideal When**
+
+- The application is simple with a single instance — the overhead of both sidecar and cluster infrastructure isn't justified. Consider [Sidecar](./sidecar) or [Microservice](./microservice).
+- All queries are batch or analytical with relaxed latency requirements — a [Microservice](./microservice) deployment is simpler and sufficient.
+- Network connectivity between sidecars and the cluster is unreliable — query delegation and cache refreshes will fail, leading to stale data. Consider standalone [Sidecar](./sidecar) deployments with direct source access.
+
+**Example Use Case**
+
+A multi-tenant SaaS platform where each tenant's application pod includes a Spice sidecar caching frequently queried datasets. The sidecars pull from a shared Spice cluster (or Spice Cloud) that handles ingestion from PostgreSQL, S3, and Databricks, runs Cayenne acceleration and refresh schedules, and serves distributed queries. Tenants get sub-millisecond reads from their local sidecar while the cluster manages data freshness and heavy query workloads centrally. When a tenant's application issues a query that spans beyond the sidecar's cached working set — such as a historical analysis or cross-dataset join — the sidecar transparently delegates to the cluster and caches the results.
+
+**See also**
+
+- [Read/Write Separation](https://spiceai.org/docs/deployment/read-write-separation) — production guide for splitting ingest from reads using shared acceleration snapshots, including Spicepod and Helm reference configurations.
diff --git a/website/src/partials/deployment/architectures/_hybrid.mdx b/website/src/partials/deployment/architectures/_hybrid.mdx
index ed7ae8828..4ed574dff 100644
--- a/website/src/partials/deployment/architectures/_hybrid.mdx
+++ b/website/src/partials/deployment/architectures/_hybrid.mdx
@@ -1,71 +1,3 @@
-Modern applications have two fundamentally different data access patterns, and no single deployment model serves both well. Large analytical queries — scanning terabytes of Iceberg data, joining Delta Lake tables, running cross-dataset aggregations — need distributed execution across many nodes. Hot operational queries — serving the working set a microservice actually uses, answering user-facing requests in under 5 milliseconds, feeding fresh context to an AI agent — need data materialized right next to the application, with no network hop.
+import Content from './_cluster-sidecar.mdx'
-The hybrid cluster-sidecar architecture addresses both patterns in a single platform. Application sidecars handle the hot path with locally accelerated data, while a centralized Spice [cluster](./cluster) (or the [Spice Cloud Platform](./hosted)) provides distributed compute for heavy queries, data ingestion, acceleration, and refresh. When a sidecar needs to reach beyond its materialized working set — a historical query, a cross-dataset join, a broad search — it transparently delegates to the cluster, which executes the query and returns results. The sidecar can then cache those results for future use.
-
-From the application's perspective, everything is localhost. From an infrastructure perspective, the system delivers the throughput of a distributed query engine and the latency of an embedded database — without ETL between them, sync jobs, or consistency gaps.
-
-Think of it as a CDN for your data: the cluster is the origin server, the sidecars are the edge nodes, and Spice handles the caching, invalidation, and routing.
-
-```mermaid
-flowchart LR
- subgraph Node1["Node / Pod"]
- direction LR
- A1["App"] <-->|"loopback"| SC1["Spice Sidecar (cache)"]
- end
-
- subgraph Node2["Node / Pod"]
- direction LR
- A2["App"] <-->|"loopback"| SC2["Spice Sidecar (cache)"]
- end
-
- SC1 & SC2 -->|"Arrow Flight (gRPC)"| Cluster
-
- subgraph Cluster["Spice Cluster or Spice Cloud"]
- direction LR
- S1["Spice Node 1"]
- S2["Spice Node 2"]
- S3["Spice Node 3"]
- end
-
- Cluster --> Sources["Data Sources (S3, PostgreSQL, Databricks, ...)"]
-```
-
-Each sidecar is configured declaratively via a `spicepod.yaml` — the datasets, views, acceleration engines, search indices, and AI models it manages. Sidecars start in seconds, consume minimal resources, and scale horizontally with application pods: scale a deployment from 5 to 50 replicas and 50 sidecars come up automatically, each caching and materializing the right data.
-
-**Benefits**
-
-- **Kubernetes-native** — designed to run on Kubernetes, leveraging pod-level sidecars with cluster-level orchestration.
-- Sub-millisecond reads via sidecar caching on loopback, with centralized data management in the cluster.
-- Transparent query delegation — sidecars automatically route queries beyond their cached working set to the cluster.
-- Sidecars remain lightweight — only caching, no ingestion or acceleration overhead.
-- Cluster (or Spice Cloud) handles complex operations: data ingestion, [Spice Cayenne](../../components/data-accelerators/cayenne) acceleration, distributed query, hybrid search, and refresh from sources.
-- Works with both self-managed Spice clusters and the managed [Spice Cloud Platform](./hosted) as the centralized backend. The Spice Cloud hybrid model is the most common production topology.
-- Sidecars can run anywhere — in your VPC, on-premises, at the edge, or in any Kubernetes cluster — while connecting securely to the managed cluster.
-- Horizontal scalability — add sidecars without increasing load on data sources.
-- Resilience — sidecars serve cached data even if the cluster is temporarily unavailable.
-- Secure by default — mTLS encryption across all sidecar-to-cluster communication, with data encrypted at rest and in transit.
-
-**Considerations**
-
-- More complex deployment structure requiring both sidecar and cluster infrastructure. [Spice Cloud](./hosted) reduces this burden by managing the cluster.
-- Cache coherency — sidecars must be configured with appropriate refresh intervals or TTLs to balance freshness with performance.
-- Requires a Spice cluster deployment or [Spice Cloud Platform](./hosted) subscription ([Spice.ai Enterprise](https://spice.ai/enterprise) for self-managed clustering with SSO, RBAC, and audit logs).
-- Network connectivity between sidecars and the cluster must be reliable for cache refreshes and query delegation.
-
-**Use This Approach When**
-
-- Applications require sub-millisecond reads but data ingestion and acceleration should be centralized.
-- Multiple application instances need fast access to the same datasets without each independently querying data sources.
-- Reducing load on upstream data sources is a priority — the cluster ingests once, sidecars cache locally.
-- The system benefits from separating the caching tier (sidecars) from the data processing tier (cluster).
-- Workloads span both real-time operational queries and large-scale analytical queries on the same data (e.g., an operational data lakehouse on S3/Iceberg).
-
-**Not Ideal When**
-
-- The application is simple with a single instance — the overhead of both sidecar and cluster infrastructure isn't justified. Consider [Sidecar](./sidecar) or [Microservice](./microservice).
-- All queries are batch or analytical with relaxed latency requirements — a [Microservice](./microservice) deployment is simpler and sufficient.
-- Network connectivity between sidecars and the cluster is unreliable — query delegation and cache refreshes will fail, leading to stale data. Consider standalone [Sidecar](./sidecar) deployments with direct source access.
-
-**Example Use Case**
-
-A multi-tenant SaaS platform where each tenant's application pod includes a Spice sidecar caching frequently queried datasets. The sidecars pull from a shared Spice cluster (or Spice Cloud) that handles ingestion from PostgreSQL, S3, and Databricks, runs Cayenne acceleration and refresh schedules, and serves distributed queries. Tenants get sub-millisecond reads from their local sidecar while the cluster manages data freshness and heavy query workloads centrally. When a tenant's application issues a query that spans beyond the sidecar's cached working set — such as a historical analysis or cross-dataset join — the sidecar transparently delegates to the cluster and caches the results.
+