Skip to content

Commit 6612ff0

Browse files
committed
docs for rc5
1 parent f8f531f commit 6612ff0

5 files changed

Lines changed: 106 additions & 0 deletions

File tree

website/docs/components/data-accelerators/arrow/index.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,33 @@ datasets:
5353

5454
See [Hash Index](../../features/data-acceleration/hash-index) for configuration details, supported data types, and performance characteristics.
5555

56+
## Native Upserts with Primary Key Matching
57+
58+
Spice supports efficient upsert (update-or-insert) operations on Arrow-accelerated tables using primary key matching. When a dataset is accelerated with Arrow and a `primary_key` is specified, incoming rows with matching primary key values will update existing records; otherwise, new records are inserted.
59+
60+
### Example Upsert Configuration
61+
62+
```yaml
63+
datasets:
64+
- from: s3://bucket/orders.parquet
65+
name: orders
66+
acceleration:
67+
engine: arrow
68+
primary_key: order_id
69+
```
70+
71+
- When you insert or load data, if a row's `order_id` matches an existing record, the record is updated in-place.
72+
- If the `order_id` is new, a new record is inserted.
73+
74+
This enables efficient update-or-insert semantics for in-memory datasets, ideal for CDC, streaming, and real-time analytics workloads.
75+
76+
### Notes
77+
- Upsert support requires a defined `primary_key`.
78+
- Upserts are performed in-memory and are not persisted after runtime shutdown.
79+
- For persistent upserts, use a persistent accelerator (e.g., DuckDB, Cayenne).
80+
81+
---
82+
5683
## Limitations
5784

5885
- The In-Memory Arrow Data Accelerator does not support persistent storage. Data is stored in-memory and will be lost when the Spice runtime is stopped.

website/docs/components/data-accelerators/cayenne/index.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,22 @@ For optimal performance, store Cayenne data files on NVMe storage. NVMe provides
4646

4747
Use [S3 Express One Zone](#aws-s3-express-one-zone-storage) when persistence of accelerations across restarts is required. S3 Express One Zone adds network latency compared to local NVMe but provides durability. Sharing accelerated data across multiple Spice instances is planned for a future release.
4848

49+
## Advanced Internals
50+
51+
### Sequence-based Upserts and Deletes
52+
Cayenne uses Iceberg-style sequence numbers to enable upsert and delete semantics. Each row is tagged with a sequence number, allowing efficient handling of row-level changes without rewriting entire files. Deletes are tracked as tombstones, and upserts are resolved at query time.
53+
54+
### Metadata Management
55+
Cayenne maintains in-process metadata for fast query planning. Metadata includes file listings, statistics, and sequence maps. This enables fast discovery and pruning of data files during query execution.
56+
57+
### Persistent Acceleration
58+
Cayenne stores acceleration data on NVMe or S3 Express One Zone. Acceleration state is durable across restarts, and future releases will support sharing acceleration across multiple Spice instances.
59+
60+
### Vortex Format
61+
Cayenne leverages the Vortex columnar format for zero-copy Arrow compatibility, fast random access, and extensible encoding.
62+
63+
---
64+
4965
## Configuration
5066

5167
To use Spice Cayenne as the data accelerator, specify `cayenne` as the `engine` for acceleration. Spice Cayenne supports `mode: file`, `mode: file_create`, and `mode: file_update` and stores data on disk.

website/docs/components/data-connectors/mongodb.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,41 @@ datasets:
2727
2828
## Configuration
2929
30+
### Real-Time Change Data Capture (CDC) with MongoDB Change Streams
31+
32+
Spice supports real-time Change Data Capture (CDC) from MongoDB using native [MongoDB Change Streams](https://www.mongodb.com/docs/manual/changeStreams/). This enables streaming inserts, updates, and deletes from your MongoDB collections directly into Spice accelerators, without requiring Debezium or Kafka.
33+
34+
#### Enabling CDC with `refresh_mode: changes`
35+
36+
To enable real-time CDC, set `refresh_mode: changes` in your dataset configuration:
37+
38+
```yaml
39+
datasets:
40+
- from: mongodb:my_collection
41+
name: my_collection
42+
params:
43+
host: my-cluster.mongodb.net
44+
db: mydb
45+
acceleration:
46+
enabled: true
47+
engine: duckdb
48+
refresh_mode: changes
49+
```
50+
51+
- `refresh_mode: changes` tells Spice to use MongoDB Change Streams for this dataset.
52+
- No Debezium or Kafka is required—Spice connects directly to MongoDB.
53+
- Changes are streamed in real time into the configured accelerator (e.g., DuckDB, Arrow).
54+
55+
#### Use Cases
56+
- Real-time analytics on operational data
57+
- Low-latency dashboards and event-driven pipelines
58+
59+
#### Notes
60+
- Requires MongoDB 4.0+ and a replica set or sharded cluster.
61+
- Ensure your MongoDB user has `changeStream` privileges.
62+
63+
---
64+
3065
### `from`
3166

3267
The `from` field takes the form `mongodb:{table_name}` where `table_name` is the table identifer in the MongoDB server to read from.

website/docs/features/cdc/index.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,20 @@ It is recommended to use CDC-accelerated datasets with persistent data accelerat
2929

3030
:::
3131

32+
## Kafka CDC Offset Persistence
33+
34+
Spice now persists Kafka CDC offsets in sidecar tables, enabling durable and resumable CDC streams. When consuming from Kafka topics, Spice records the last committed offset for each partition in a dedicated sidecar table. On restart or failover, Spice resumes from the last committed offset, ensuring no data loss or duplicate processing.
35+
36+
### Benefits
37+
- Durable CDC: Survives restarts and failover without replaying the entire topic.
38+
- Fast recovery: Resumes from the last processed event, not the earliest available.
39+
- No external offset store required.
40+
41+
### Example
42+
No special configuration is required—offset persistence is automatic for all Kafka CDC datasets.
43+
44+
---
45+
3246
## Supported Data Connectors
3347

3448
Enabling CDC by setting `refresh_mode: changes` in the acceleration settings requires support from the data connector to provide a stream of row-level changes.

website/docs/features/observability/index.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,20 @@ Spice provides monitoring and observability through three mechanisms:
2222
- [New Relic](../monitoring/new-relic)
2323
- [Zipkin](../monitoring/zipkin)
2424

25+
## HTTP Rate-Control Persistence
26+
27+
Spice persists HTTP rate-control (rate-limiting) state in object storage, ensuring that per-endpoint throttle counters survive restarts and are consistent across replicas. This enables reliable rate-limiting for all HTTP endpoints, including `/metrics`, even in distributed or containerized deployments.
28+
29+
### Key Features
30+
- Persistent rate-limiting: Throttle state is saved to object storage and restored on restart.
31+
- Consistent across replicas: All instances share the same rate-limit state.
32+
- `/metrics` endpoint is independently rate-limited to prevent scraping from impacting query serving.
33+
34+
### Usage
35+
No special configuration is required—rate-control persistence is enabled by default when object storage is configured for the runtime.
36+
37+
---
38+
2539
## Prometheus Metrics Endpoint
2640

2741
Spice exposes a Prometheus-compatible metrics endpoint that monitoring systems can scrape. The endpoint serves metrics in the [Prometheus exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/), which is supported by most enterprise monitoring platforms including Datadog, New Relic, Chronosphere, Grafana Cloud, and others.

0 commit comments

Comments
 (0)