From 6612ff01acfddea5a654137aaf7899056ed66bc6 Mon Sep 17 00:00:00 2001 From: jeadie Date: Mon, 18 May 2026 12:17:08 +1000 Subject: [PATCH 1/5] docs for rc5 --- .../data-accelerators/arrow/index.md | 27 ++++++++++++++ .../data-accelerators/cayenne/index.md | 16 +++++++++ .../components/data-connectors/mongodb.md | 35 +++++++++++++++++++ website/docs/features/cdc/index.md | 14 ++++++++ website/docs/features/observability/index.md | 14 ++++++++ 5 files changed, 106 insertions(+) diff --git a/website/docs/components/data-accelerators/arrow/index.md b/website/docs/components/data-accelerators/arrow/index.md index 39ee637c3..fedbc040d 100644 --- a/website/docs/components/data-accelerators/arrow/index.md +++ b/website/docs/components/data-accelerators/arrow/index.md @@ -53,6 +53,33 @@ datasets: See [Hash Index](../../features/data-acceleration/hash-index) for configuration details, supported data types, and performance characteristics. +## Native Upserts with Primary Key Matching + +Spice supports efficient upsert (update-or-insert) operations on Arrow-accelerated tables using primary key matching. When a dataset is accelerated with Arrow and a `primary_key` is specified, incoming rows with matching primary key values will update existing records; otherwise, new records are inserted. + +### Example Upsert Configuration + +```yaml +datasets: + - from: s3://bucket/orders.parquet + name: orders + acceleration: + engine: arrow + primary_key: order_id +``` + +- When you insert or load data, if a row's `order_id` matches an existing record, the record is updated in-place. +- If the `order_id` is new, a new record is inserted. + +This enables efficient update-or-insert semantics for in-memory datasets, ideal for CDC, streaming, and real-time analytics workloads. + +### Notes +- Upsert support requires a defined `primary_key`. +- Upserts are performed in-memory and are not persisted after runtime shutdown. +- For persistent upserts, use a persistent accelerator (e.g., DuckDB, Cayenne). + +--- + ## Limitations - The In-Memory Arrow Data Accelerator does not support persistent storage. Data is stored in-memory and will be lost when the Spice runtime is stopped. diff --git a/website/docs/components/data-accelerators/cayenne/index.md b/website/docs/components/data-accelerators/cayenne/index.md index 574351ba0..56737ccc6 100644 --- a/website/docs/components/data-accelerators/cayenne/index.md +++ b/website/docs/components/data-accelerators/cayenne/index.md @@ -46,6 +46,22 @@ For optimal performance, store Cayenne data files on NVMe storage. NVMe provides Use [S3 Express One Zone](#aws-s3-express-one-zone-storage) when persistence of accelerations across restarts is required. S3 Express One Zone adds network latency compared to local NVMe but provides durability. Sharing accelerated data across multiple Spice instances is planned for a future release. +## Advanced Internals + +### Sequence-based Upserts and Deletes +Cayenne uses Iceberg-style sequence numbers to enable upsert and delete semantics. Each row is tagged with a sequence number, allowing efficient handling of row-level changes without rewriting entire files. Deletes are tracked as tombstones, and upserts are resolved at query time. + +### Metadata Management +Cayenne maintains in-process metadata for fast query planning. Metadata includes file listings, statistics, and sequence maps. This enables fast discovery and pruning of data files during query execution. + +### Persistent Acceleration +Cayenne stores acceleration data on NVMe or S3 Express One Zone. Acceleration state is durable across restarts, and future releases will support sharing acceleration across multiple Spice instances. + +### Vortex Format +Cayenne leverages the Vortex columnar format for zero-copy Arrow compatibility, fast random access, and extensible encoding. + +--- + ## Configuration To use Spice Cayenne as the data accelerator, specify `cayenne` as the `engine` for acceleration. Spice Cayenne supports `mode: file`, `mode: file_create`, and `mode: file_update` and stores data on disk. diff --git a/website/docs/components/data-connectors/mongodb.md b/website/docs/components/data-connectors/mongodb.md index ec188e532..204fcd13f 100644 --- a/website/docs/components/data-connectors/mongodb.md +++ b/website/docs/components/data-connectors/mongodb.md @@ -27,6 +27,41 @@ datasets: ## Configuration +### Real-Time Change Data Capture (CDC) with MongoDB Change Streams + +Spice supports real-time Change Data Capture (CDC) from MongoDB using native [MongoDB Change Streams](https://www.mongodb.com/docs/manual/changeStreams/). This enables streaming inserts, updates, and deletes from your MongoDB collections directly into Spice accelerators, without requiring Debezium or Kafka. + +#### Enabling CDC with `refresh_mode: changes` + +To enable real-time CDC, set `refresh_mode: changes` in your dataset configuration: + +```yaml +datasets: + - from: mongodb:my_collection + name: my_collection + params: + host: my-cluster.mongodb.net + db: mydb + acceleration: + enabled: true + engine: duckdb + refresh_mode: changes +``` + +- `refresh_mode: changes` tells Spice to use MongoDB Change Streams for this dataset. +- No Debezium or Kafka is required—Spice connects directly to MongoDB. +- Changes are streamed in real time into the configured accelerator (e.g., DuckDB, Arrow). + +#### Use Cases +- Real-time analytics on operational data +- Low-latency dashboards and event-driven pipelines + +#### Notes +- Requires MongoDB 4.0+ and a replica set or sharded cluster. +- Ensure your MongoDB user has `changeStream` privileges. + +--- + ### `from` The `from` field takes the form `mongodb:{table_name}` where `table_name` is the table identifer in the MongoDB server to read from. diff --git a/website/docs/features/cdc/index.md b/website/docs/features/cdc/index.md index fc6e9df24..1d4426685 100644 --- a/website/docs/features/cdc/index.md +++ b/website/docs/features/cdc/index.md @@ -29,6 +29,20 @@ It is recommended to use CDC-accelerated datasets with persistent data accelerat ::: +## Kafka CDC Offset Persistence + +Spice now persists Kafka CDC offsets in sidecar tables, enabling durable and resumable CDC streams. When consuming from Kafka topics, Spice records the last committed offset for each partition in a dedicated sidecar table. On restart or failover, Spice resumes from the last committed offset, ensuring no data loss or duplicate processing. + +### Benefits +- Durable CDC: Survives restarts and failover without replaying the entire topic. +- Fast recovery: Resumes from the last processed event, not the earliest available. +- No external offset store required. + +### Example +No special configuration is required—offset persistence is automatic for all Kafka CDC datasets. + +--- + ## Supported Data Connectors Enabling CDC by setting `refresh_mode: changes` in the acceleration settings requires support from the data connector to provide a stream of row-level changes. diff --git a/website/docs/features/observability/index.md b/website/docs/features/observability/index.md index 01eeebdf4..b44b5fb46 100644 --- a/website/docs/features/observability/index.md +++ b/website/docs/features/observability/index.md @@ -22,6 +22,20 @@ Spice provides monitoring and observability through three mechanisms: - [New Relic](../monitoring/new-relic) - [Zipkin](../monitoring/zipkin) +## HTTP Rate-Control Persistence + +Spice persists HTTP rate-control (rate-limiting) state in object storage, ensuring that per-endpoint throttle counters survive restarts and are consistent across replicas. This enables reliable rate-limiting for all HTTP endpoints, including `/metrics`, even in distributed or containerized deployments. + +### Key Features +- Persistent rate-limiting: Throttle state is saved to object storage and restored on restart. +- Consistent across replicas: All instances share the same rate-limit state. +- `/metrics` endpoint is independently rate-limited to prevent scraping from impacting query serving. + +### Usage +No special configuration is required—rate-control persistence is enabled by default when object storage is configured for the runtime. + +--- + ## Prometheus Metrics Endpoint Spice exposes a Prometheus-compatible metrics endpoint that monitoring systems can scrape. The endpoint serves metrics in the [Prometheus exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/), which is supported by most enterprise monitoring platforms including Datadog, New Relic, Chronosphere, Grafana Cloud, and others. From a0a7f89487843e5ec1d68e9752e48d156354b8e4 Mon Sep 17 00:00:00 2001 From: jeadie Date: Mon, 18 May 2026 13:35:43 +1000 Subject: [PATCH 2/5] docs improve --- .../data-accelerators/cayenne/index.md | 1 + website/docs/reference/spicepod/runtime.md | 21 +++++++++++-------- website/docs/reference/sql/dml.md | 4 ++-- 3 files changed, 15 insertions(+), 11 deletions(-) diff --git a/website/docs/components/data-accelerators/cayenne/index.md b/website/docs/components/data-accelerators/cayenne/index.md index 56737ccc6..db9c8f88c 100644 --- a/website/docs/components/data-accelerators/cayenne/index.md +++ b/website/docs/components/data-accelerators/cayenne/index.md @@ -48,6 +48,7 @@ Use [S3 Express One Zone](#aws-s3-express-one-zone-storage) when persistence of ## Advanced Internals + ### Sequence-based Upserts and Deletes Cayenne uses Iceberg-style sequence numbers to enable upsert and delete semantics. Each row is tagged with a sequence number, allowing efficient handling of row-level changes without rewriting entire files. Deletes are tracked as tombstones, and upserts are resolved at query time. diff --git a/website/docs/reference/spicepod/runtime.md b/website/docs/reference/spicepod/runtime.md index fcf2f6c0c..81c9189df 100644 --- a/website/docs/reference/spicepod/runtime.md +++ b/website/docs/reference/spicepod/runtime.md @@ -215,6 +215,15 @@ The TLS section specifies the configuration for enabling Transport Layer Securit In addition to configuring TLS via the manifest, TLS can also be configured via `spiced` command line arguments using the `--tls-enabled true` flag along with `--tls-certificate`/`--tls-certificate-file` and `--tls-key`/`--tls-key-file`. +### Certificate Hot-Reload + +Spice can hot-reload TLS certificates and client CA files for runtime endpoints. Update the certificate, key, or CA file on disk, then send `SIGHUP` to the Spice process to reload without restart. Only file-based certificates/keys/CA are hot-reloaded (not inline PEM). Existing connections are not interrupted; only new connections use the updated files. If reload fails, the previous certificate remains active and a warning is logged. + +**Steps:** +1. Replace the certificate/key/CA file on disk. +2. Send `SIGHUP` to the Spice process (e.g., `kill -SIGHUP `). +3. Check logs for reload confirmation or errors. + ### `runtime.tls.enabled` Enables or disables TLS for the runtime endpoints. @@ -233,10 +242,9 @@ The TLS certificate to use for securing the runtime endpoints. The certificate c ```yaml runtime: tls: - ... certificate: | -----BEGIN CERTIFICATE----- - ... + (certificate contents) -----END CERTIFICATE----- ``` @@ -254,7 +262,6 @@ The path to the TLS PEM-encoded certificate file. Only one of `certificate` or ` ```yaml runtime: tls: - ... certificate_file: /path/to/cert.pem ``` @@ -265,10 +272,9 @@ The TLS key to use for securing the runtime endpoints. The key can also come fro ```yaml runtime: tls: - ... key: | -----BEGIN PRIVATE KEY----- - ... + (private key contents) -----END PRIVATE KEY----- ``` @@ -286,7 +292,6 @@ The path to the TLS PEM-encoded key file. Only one of `key` or `key_file` must b ```yaml runtime: tls: - ... key_file: /path/to/key.pem ``` @@ -323,7 +328,6 @@ Path to a PEM-encoded CA bundle used to verify client certificates. The file is ```yaml runtime: tls: - ... client_auth_ca_file: /path/to/client-ca.pem ``` @@ -334,10 +338,9 @@ Inline PEM (or `${ secrets:... }`) form of the client CA bundle. Mutually exclus ```yaml runtime: tls: - ... client_auth_ca: | -----BEGIN CERTIFICATE----- - ... + (CA certificate contents) -----END CERTIFICATE----- ``` diff --git a/website/docs/reference/sql/dml.md b/website/docs/reference/sql/dml.md index 3c1968cf3..40b0d9e6d 100644 --- a/website/docs/reference/sql/dml.md +++ b/website/docs/reference/sql/dml.md @@ -7,8 +7,8 @@ sidebar_position: 30 Data Manipulation Language (DML) statements are used to insert, update, and delete data in tables. Spice supports DML operations on [write-capable data connectors](../../tags/write) configured with `access: read_write`. -:::warning[Supported Operations] -Spice supports `INSERT` for write-capable connectors and `MERGE INTO` for [Spice Cayenne](../../components/data-accelerators/cayenne) catalog tables. `UPDATE` and `DELETE` statements are not yet supported as standalone operations. For data modifications, use `MERGE INTO` or the source database directly. +:::info[Supported Operations] +Spice supports `INSERT`, `UPDATE`, and `DELETE` for write-capable connectors that support these operations. As of v2.0.0-rc.5, the [Snowflake](../../components/data-connectors/snowflake) connector supports all three DML operations when `access: read_write` is set. For Cayenne, use `MERGE INTO` for updates and deletes. ::: :::info From 673b56bbe8b7d10e965d3283d0a5723179152fdf Mon Sep 17 00:00:00 2001 From: jeadie Date: Mon, 18 May 2026 13:59:23 +1000 Subject: [PATCH 3/5] proofread docs --- .../data-accelerators/cayenne/index.md | 17 ----------------- .../components/data-connectors/snowflake.md | 3 +++ website/docs/features/cdc/index.md | 14 -------------- website/docs/features/observability/index.md | 13 ------------- website/docs/reference/spicepod/runtime.md | 4 ++-- website/docs/reference/sql/dml.md | 5 ++--- 6 files changed, 7 insertions(+), 49 deletions(-) diff --git a/website/docs/components/data-accelerators/cayenne/index.md b/website/docs/components/data-accelerators/cayenne/index.md index db9c8f88c..574351ba0 100644 --- a/website/docs/components/data-accelerators/cayenne/index.md +++ b/website/docs/components/data-accelerators/cayenne/index.md @@ -46,23 +46,6 @@ For optimal performance, store Cayenne data files on NVMe storage. NVMe provides Use [S3 Express One Zone](#aws-s3-express-one-zone-storage) when persistence of accelerations across restarts is required. S3 Express One Zone adds network latency compared to local NVMe but provides durability. Sharing accelerated data across multiple Spice instances is planned for a future release. -## Advanced Internals - - -### Sequence-based Upserts and Deletes -Cayenne uses Iceberg-style sequence numbers to enable upsert and delete semantics. Each row is tagged with a sequence number, allowing efficient handling of row-level changes without rewriting entire files. Deletes are tracked as tombstones, and upserts are resolved at query time. - -### Metadata Management -Cayenne maintains in-process metadata for fast query planning. Metadata includes file listings, statistics, and sequence maps. This enables fast discovery and pruning of data files during query execution. - -### Persistent Acceleration -Cayenne stores acceleration data on NVMe or S3 Express One Zone. Acceleration state is durable across restarts, and future releases will support sharing acceleration across multiple Spice instances. - -### Vortex Format -Cayenne leverages the Vortex columnar format for zero-copy Arrow compatibility, fast random access, and extensible encoding. - ---- - ## Configuration To use Spice Cayenne as the data accelerator, specify `cayenne` as the `engine` for acceleration. Spice Cayenne supports `mode: file`, `mode: file_create`, and `mode: file_update` and stores data on disk. diff --git a/website/docs/components/data-connectors/snowflake.md b/website/docs/components/data-connectors/snowflake.md index 2fb302a72..aa7fa3a85 100644 --- a/website/docs/components/data-connectors/snowflake.md +++ b/website/docs/components/data-connectors/snowflake.md @@ -3,6 +3,9 @@ title: 'Snowflake Data Connector' sidebar_label: 'Snowflake Data Connector' description: 'Snowflake Data Connector Documentation' pagination_prev: null +tags: + - data-connectors + - write --- import Tabs from '@theme/Tabs'; diff --git a/website/docs/features/cdc/index.md b/website/docs/features/cdc/index.md index 1d4426685..fc6e9df24 100644 --- a/website/docs/features/cdc/index.md +++ b/website/docs/features/cdc/index.md @@ -29,20 +29,6 @@ It is recommended to use CDC-accelerated datasets with persistent data accelerat ::: -## Kafka CDC Offset Persistence - -Spice now persists Kafka CDC offsets in sidecar tables, enabling durable and resumable CDC streams. When consuming from Kafka topics, Spice records the last committed offset for each partition in a dedicated sidecar table. On restart or failover, Spice resumes from the last committed offset, ensuring no data loss or duplicate processing. - -### Benefits -- Durable CDC: Survives restarts and failover without replaying the entire topic. -- Fast recovery: Resumes from the last processed event, not the earliest available. -- No external offset store required. - -### Example -No special configuration is required—offset persistence is automatic for all Kafka CDC datasets. - ---- - ## Supported Data Connectors Enabling CDC by setting `refresh_mode: changes` in the acceleration settings requires support from the data connector to provide a stream of row-level changes. diff --git a/website/docs/features/observability/index.md b/website/docs/features/observability/index.md index b44b5fb46..0f579074d 100644 --- a/website/docs/features/observability/index.md +++ b/website/docs/features/observability/index.md @@ -22,19 +22,6 @@ Spice provides monitoring and observability through three mechanisms: - [New Relic](../monitoring/new-relic) - [Zipkin](../monitoring/zipkin) -## HTTP Rate-Control Persistence - -Spice persists HTTP rate-control (rate-limiting) state in object storage, ensuring that per-endpoint throttle counters survive restarts and are consistent across replicas. This enables reliable rate-limiting for all HTTP endpoints, including `/metrics`, even in distributed or containerized deployments. - -### Key Features -- Persistent rate-limiting: Throttle state is saved to object storage and restored on restart. -- Consistent across replicas: All instances share the same rate-limit state. -- `/metrics` endpoint is independently rate-limited to prevent scraping from impacting query serving. - -### Usage -No special configuration is required—rate-control persistence is enabled by default when object storage is configured for the runtime. - ---- ## Prometheus Metrics Endpoint diff --git a/website/docs/reference/spicepod/runtime.md b/website/docs/reference/spicepod/runtime.md index 81c9189df..7117ff29a 100644 --- a/website/docs/reference/spicepod/runtime.md +++ b/website/docs/reference/spicepod/runtime.md @@ -244,7 +244,7 @@ runtime: tls: certificate: | -----BEGIN CERTIFICATE----- - (certificate contents) + ... -----END CERTIFICATE----- ``` @@ -340,7 +340,7 @@ runtime: tls: client_auth_ca: | -----BEGIN CERTIFICATE----- - (CA certificate contents) + ... -----END CERTIFICATE----- ``` diff --git a/website/docs/reference/sql/dml.md b/website/docs/reference/sql/dml.md index 40b0d9e6d..f7daa0df6 100644 --- a/website/docs/reference/sql/dml.md +++ b/website/docs/reference/sql/dml.md @@ -7,9 +7,8 @@ sidebar_position: 30 Data Manipulation Language (DML) statements are used to insert, update, and delete data in tables. Spice supports DML operations on [write-capable data connectors](../../tags/write) configured with `access: read_write`. -:::info[Supported Operations] -Spice supports `INSERT`, `UPDATE`, and `DELETE` for write-capable connectors that support these operations. As of v2.0.0-rc.5, the [Snowflake](../../components/data-connectors/snowflake) connector supports all three DML operations when `access: read_write` is set. For Cayenne, use `MERGE INTO` for updates and deletes. -::: +:::warning[Supported Operations] +Spice supports `INSERT` for write-capable connectors and `MERGE INTO` for [Spice Cayenne](../../components/data-accelerators/cayenne) catalog tables. `UPDATE` and `DELETE` statements are not yet supported as standalone operations. For data modifications, use `MERGE INTO` or the source database directly. :::info Spice is built on [Apache DataFusion](https://datafusion.apache.org/) and uses the PostgreSQL dialect, even when querying datasources with different SQL dialects. From 2d5a04391aba315ae8717407cabf5e4729eec6d0 Mon Sep 17 00:00:00 2001 From: jeadie Date: Mon, 18 May 2026 14:04:39 +1000 Subject: [PATCH 4/5] again --- .../components/data-connectors/mongodb.md | 61 ++++++++----------- website/docs/features/observability/index.md | 1 - website/docs/reference/sql/dml.md | 1 + 3 files changed, 27 insertions(+), 36 deletions(-) diff --git a/website/docs/components/data-connectors/mongodb.md b/website/docs/components/data-connectors/mongodb.md index 204fcd13f..50403ee21 100644 --- a/website/docs/components/data-connectors/mongodb.md +++ b/website/docs/components/data-connectors/mongodb.md @@ -27,41 +27,6 @@ datasets: ## Configuration -### Real-Time Change Data Capture (CDC) with MongoDB Change Streams - -Spice supports real-time Change Data Capture (CDC) from MongoDB using native [MongoDB Change Streams](https://www.mongodb.com/docs/manual/changeStreams/). This enables streaming inserts, updates, and deletes from your MongoDB collections directly into Spice accelerators, without requiring Debezium or Kafka. - -#### Enabling CDC with `refresh_mode: changes` - -To enable real-time CDC, set `refresh_mode: changes` in your dataset configuration: - -```yaml -datasets: - - from: mongodb:my_collection - name: my_collection - params: - host: my-cluster.mongodb.net - db: mydb - acceleration: - enabled: true - engine: duckdb - refresh_mode: changes -``` - -- `refresh_mode: changes` tells Spice to use MongoDB Change Streams for this dataset. -- No Debezium or Kafka is required—Spice connects directly to MongoDB. -- Changes are streamed in real time into the configured accelerator (e.g., DuckDB, Arrow). - -#### Use Cases -- Real-time analytics on operational data -- Low-latency dashboards and event-driven pipelines - -#### Notes -- Requires MongoDB 4.0+ and a replica set or sharded cluster. -- Ensure your MongoDB user has `changeStream` privileges. - ---- - ### `from` The `from` field takes the form `mongodb:{table_name}` where `table_name` is the table identifer in the MongoDB server to read from. @@ -300,6 +265,32 @@ datasets: mongodb_pool_max: 10 ``` +### Using MongoDB Change Streams + +Spice supports real-time Change Data Capture (CDC) from MongoDB using native [MongoDB Change Streams](https://www.mongodb.com/docs/manual/changeStreams/). This enables streaming inserts, updates, and deletes from your MongoDB collections directly into Spice accelerators. + +To enable real-time CDC, set `refresh_mode: changes` in the dataset's configuration: + +```yaml +datasets: + - from: mongodb:my_collection + name: my_collection + params: + host: my-cluster.mongodb.net + db: mydb + acceleration: + enabled: true + engine: duckdb + refresh_mode: changes +``` + +#### Notes +- Requires MongoDB 4.0+ and a replica set or sharded cluster. +- Ensure your MongoDB user has `changeStream` privileges. + +--- + + ## Secrets Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](../secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](../secret-stores#using-secrets). diff --git a/website/docs/features/observability/index.md b/website/docs/features/observability/index.md index 0f579074d..01eeebdf4 100644 --- a/website/docs/features/observability/index.md +++ b/website/docs/features/observability/index.md @@ -22,7 +22,6 @@ Spice provides monitoring and observability through three mechanisms: - [New Relic](../monitoring/new-relic) - [Zipkin](../monitoring/zipkin) - ## Prometheus Metrics Endpoint Spice exposes a Prometheus-compatible metrics endpoint that monitoring systems can scrape. The endpoint serves metrics in the [Prometheus exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/), which is supported by most enterprise monitoring platforms including Datadog, New Relic, Chronosphere, Grafana Cloud, and others. diff --git a/website/docs/reference/sql/dml.md b/website/docs/reference/sql/dml.md index f7daa0df6..3c1968cf3 100644 --- a/website/docs/reference/sql/dml.md +++ b/website/docs/reference/sql/dml.md @@ -9,6 +9,7 @@ Data Manipulation Language (DML) statements are used to insert, update, and dele :::warning[Supported Operations] Spice supports `INSERT` for write-capable connectors and `MERGE INTO` for [Spice Cayenne](../../components/data-accelerators/cayenne) catalog tables. `UPDATE` and `DELETE` statements are not yet supported as standalone operations. For data modifications, use `MERGE INTO` or the source database directly. +::: :::info Spice is built on [Apache DataFusion](https://datafusion.apache.org/) and uses the PostgreSQL dialect, even when querying datasources with different SQL dialects. From f01d762a1e8f061b173871b11bc1bb50961eae4f Mon Sep 17 00:00:00 2001 From: jeadie Date: Mon, 18 May 2026 14:14:07 +1000 Subject: [PATCH 5/5] more --- .../data-accelerators/arrow/index.md | 27 ------------------- 1 file changed, 27 deletions(-) diff --git a/website/docs/components/data-accelerators/arrow/index.md b/website/docs/components/data-accelerators/arrow/index.md index fedbc040d..39ee637c3 100644 --- a/website/docs/components/data-accelerators/arrow/index.md +++ b/website/docs/components/data-accelerators/arrow/index.md @@ -53,33 +53,6 @@ datasets: See [Hash Index](../../features/data-acceleration/hash-index) for configuration details, supported data types, and performance characteristics. -## Native Upserts with Primary Key Matching - -Spice supports efficient upsert (update-or-insert) operations on Arrow-accelerated tables using primary key matching. When a dataset is accelerated with Arrow and a `primary_key` is specified, incoming rows with matching primary key values will update existing records; otherwise, new records are inserted. - -### Example Upsert Configuration - -```yaml -datasets: - - from: s3://bucket/orders.parquet - name: orders - acceleration: - engine: arrow - primary_key: order_id -``` - -- When you insert or load data, if a row's `order_id` matches an existing record, the record is updated in-place. -- If the `order_id` is new, a new record is inserted. - -This enables efficient update-or-insert semantics for in-memory datasets, ideal for CDC, streaming, and real-time analytics workloads. - -### Notes -- Upsert support requires a defined `primary_key`. -- Upserts are performed in-memory and are not persisted after runtime shutdown. -- For persistent upserts, use a persistent accelerator (e.g., DuckDB, Cayenne). - ---- - ## Limitations - The In-Memory Arrow Data Accelerator does not support persistent storage. Data is stored in-memory and will be lost when the Spice runtime is stopped.