You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/docs/components/data-connectors/postgres/index.md
+13-1Lines changed: 13 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -108,11 +108,23 @@ The connection to PostgreSQL can be configured by providing the following `param
108
108
| `pg_db` | The name of the database to connect to. |
109
109
| `pg_user` | The username to connect with. |
110
110
| `pg_pass` | The password to connect with. Use the [secret replacement syntax](../../components/secret-stores) to load the password from a secret store, e.g. `${secrets:my_pg_pass}`. |
111
-
| `pg_sslmode` | Optional. Specifies the SSL/TLS behavior for the connection, supported values:<br /> <ul><li>`verify-full`: (default) This mode requires an SSL connection, a valid root certificate, and the server host name to match the one specified in the certificate.</li><li>`verify-ca`: This mode requires a TLS connection and a valid root certificate.</li><li>`require`: This mode requires a TLS connection.</li><li>`prefer`: This mode will try to establish a secure TLS connection if possible, but will connect insecurely if the server does not support TLS.</li><li>`disable`: This mode will not attempt to use a TLS connection, even if the server supports it.</li></ul> |
111
+
| `pg_sslmode` | Optional. Specifies the SSL/TLS behavior for the connection, supported values:<br /> <ul><li>`verify-full`: This mode requires an SSL connection, a valid root certificate, and the server host name to match the one specified in the certificate.</li><li>`verify-ca`: This mode requires a TLS connection and a valid root certificate.</li><li>`require`: This mode requires a TLS connection.</li><li>`prefer`: (default) This mode will try to establish a secure TLS connection if possible, but will connect insecurely if the server does not support TLS.</li><li>`disable`: This mode will not attempt to use a TLS connection, even if the server supports it.</li></ul> |
112
112
| `pg_sslrootcert` | Optional parameter specifying the path to a custom PEM certificate that the connector will trust. |
113
113
| `pg_connection_pool_min_idle` | Optional. The minimum number of idle connections to keep open in the pool. Default is `1`. |
114
114
| `connection_pool_size` | Optional. The maximum number of connections created in the connection pool. Default is `5`. |
115
115
116
+
#### Replication parameters
117
+
118
+
The following parameters configure PostgreSQL [logical replication](https://www.postgresql.org/docs/current/logical-replication.html) (WAL streaming) when using `refresh_mode: changes`:
| `pg_replication_slot` | Optional. Name of the replication slot to create/reuse. Defaults to `spice_<dataset>_<dataset-hash>_<instance-hash>`. Each Spice replica MUST have its own unique slot. |
123
+
| `pg_publication` | Optional. Name of the publication to create/reuse. Defaults to `spice_<dataset>_<dataset-hash>_pub`. Shared across replicas for the same dataset. |
124
+
| `pg_replication_initial_snapshot` | Optional. Whether to take an initial snapshot of existing rows before streaming WAL changes. Default: `true`. |
125
+
| `pg_replication_temporary_slot` | Optional. If `true`, create a temporary replication slot that is dropped when the Spice process disconnects. Default: `false`(durable slot). |
126
+
| `pg_replication_status_interval` | Optional. How often to send StandbyStatusUpdate to Postgres (e.g. `10s`). Default: `10s`. |
127
+
116
128
## Types
117
129
118
130
The table below shows the PostgreSQL data types supported, along with the type mapping to Apache Arrow types in Spice.
Copy file name to clipboardExpand all lines: website/docs/components/embeddings/index.md
+22Lines changed: 22 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -303,6 +303,28 @@ datasets:
303
303
row_id: id
304
304
```
305
305
306
+
### Multi-Vector Embeddings
307
+
308
+
When the source column is `List<Utf8>` (or `LargeList<Utf8>`), Spice embeds each list element independently and produces a `List<FixedSizeList<Float32, N>>` column. This is the multi-vector (column-of-vectors) mode, useful for rows that carry several independent pieces of text such as tags, section headings, or historical queries.
309
+
310
+
```yaml
311
+
datasets:
312
+
- from: file:products.parquet
313
+
name: products
314
+
acceleration:
315
+
enabled: true
316
+
columns:
317
+
- name: tags # List<Utf8>
318
+
embeddings:
319
+
- from: local_embedding_model
320
+
aggregation: max
321
+
max_elements_per_row: 64
322
+
```
323
+
324
+
The `aggregation` field controls how per-element similarities are combined into a per-row score during vector search. `max` (default) is ColBERT-style `MaxSim`; `mean` and `sum` are also supported. The `max_elements_per_row` field caps how many list elements are embedded per row (default `32`, hard limit `1024`). Multi-vector columns also support [ColBERT-style late-interaction search](../features/search/multi-vector#late-interaction-multi-query-search) via an array of query strings.
325
+
326
+
See [Multi-Vector Search](../features/search/multi-vector) for query usage and [`columns[*].embeddings[*]`](../reference/spicepod/datasets#columnsembeddings) for the full field reference.
Copy file name to clipboardExpand all lines: website/docs/faq/index.md
+15-6Lines changed: 15 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -98,25 +98,34 @@ Spice uses [Apache DataFusion](https://datafusion.apache.org/) as its primary qu
98
98
99
99
## 17. Does Spice support Change Data Capture (CDC)?
100
100
101
-
Yes. Spice supports CDC via Debezium, enabling real-time data ingestion and materialization from databases such as PostgreSQL and MySQL. [Learn More](../features/cdc).
101
+
Yes. Spice supports streaming ingestion from several sources:
102
102
103
-
## 18. Can Spice integrate with existing BI tools?
103
+
-**Native PostgreSQL logical replication** (recommended for Postgres sources). Spice connects directly to the source using Postgres' `wal_level=logical` + pgoutput and streams `INSERT`/`UPDATE`/`DELETE` events into the accelerator. [Learn more](../features/cdc/postgres-replication.md).
104
+
-**[DynamoDB Streams](../components/data-connectors/dynamodb)** for Amazon DynamoDB sources — Spice consumes the table's change stream and applies `INSERT`/`UPDATE`/`DELETE` events to the accelerator with `refresh_mode: changes`.
105
+
-**[Apache Kafka](../components/data-connectors/kafka)** for event-streaming topics — Spice consumes records directly with `refresh_mode: append` for real-time, append-only acceleration.
106
+
-**[Debezium](../components/data-connectors/debezium)** (over Kafka), for sources where Debezium is already deployed, or for databases without a native Spice CDC path (MySQL, SQL Server, etc.). [Learn more](../features/cdc).
107
+
108
+
## 18. How do I keep an accelerated dataset incrementally up-to-date?
109
+
110
+
For sources with a monotonically-increasing version column (e.g. `updated_at`), Spice incrementally ingests new and modified records using [`time_column`](../reference/spicepod/datasets#time_column) + [`refresh_mode: append`](../features/data-acceleration/data-refresh#append), with [`refresh_append_overlap`](../reference/spicepod/datasets#accelerationrefresh_append_overlap) to tolerate clock skew and [`retention_period`](../reference/spicepod/datasets#accelerationretention_period) to evict old or soft-deleted records. Pair with `primary_key` + `on_conflict: upsert` to deduplicate re-read rows within the overlap window. See [Data Refresh](../features/data-acceleration/data-refresh) for configuration details and examples.
111
+
112
+
## 19. Can Spice integrate with existing BI tools?
104
113
105
114
Yes. Spice integrates with BI tools through standard SQL interfaces (ODBC, JDBC, Arrow Flight SQL), enabling accelerated, real-time analytics for dashboards and reporting. An official [Tableau Connector](../clients/tableau) is available and a [BI Acceleration](https://www.youtube.com/watch?v=blEtLgRKu0c) demo using Apache Superset.
106
115
107
-
## 19. How does Spice handle data privacy and compliance?
116
+
## 20. How does Spice handle data privacy and compliance?
108
117
109
118
Spice provides secure, auditable data access through sandboxed runtimes, secure endpoint checks, and detailed telemetry and tracing. The Spice Cloud Platform (SCP) is SOC 2 Type II compliant, meeting enterprise security and compliance requirements.
110
119
111
-
## 20. Can Spice be used for real-time analytics?
120
+
## 21. Can Spice be used for real-time analytics?
112
121
113
122
Yes. Spice accelerates data locally using Apache Arrow, Spice Cayenne (Vortex), DuckDB, SQLite, or PostgreSQL, enabling real-time analytics and sub-second query performance for data-intensive applications and dashboards.
114
123
115
-
## 21. How can developers contribute to Spice?
124
+
## 22. How can developers contribute to Spice?
116
125
117
126
Developers can contribute by submitting code, documentation, or raising issues on [GitHub](https://github.com/spiceai/spiceai). See [CONTRIBUTING.md](https://github.com/spiceai/spiceai/blob/trunk/CONTRIBUTING) for guidelines.
118
127
119
-
## 22. Does Spice support schema evolution?
128
+
## 23. Does Spice support schema evolution?
120
129
121
130
Spice infers the schema for datasets and views at startup and does not apply runtime schema changes by default. If the source schema changes while the runtime is running (for example, columns are added, removed, or their types change), data refreshes will fail with a schema mismatch error rather than silently applying the new schema. This behavior is intentional — it protects against unintentional or breaking schema changes propagating into accelerated tables.
Copy file name to clipboardExpand all lines: website/docs/features/cdc/index.md
+6-1Lines changed: 6 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,12 @@ It is recommended to use CDC-accelerated datasets with persistent data accelerat
33
33
34
34
Enabling CDC by setting `refresh_mode: changes` in the acceleration settings requires support from the data connector to provide a stream of row-level changes.
35
35
36
-
Currently, the only supported data connector is [Debezium](../components/data-connectors/debezium).
36
+
Spice currently supports streaming ingestion via:
37
+
38
+
-**[PostgreSQL Logical Replication](./postgres-replication.md)** — **recommended** for PostgreSQL sources. Spice connects directly to the source using Postgres' native logical replication protocol (`wal_level=logical` + pgoutput) and streams `INSERT`/`UPDATE`/`DELETE` events into the accelerator. No Kafka, no Debezium, no external services.
39
+
-**[DynamoDB Streams](../../components/data-connectors/dynamodb#streams)** — for Amazon DynamoDB sources. Spice consumes the table's DynamoDB Streams directly and applies `INSERT`/`UPDATE`/`DELETE` events to the accelerator.
40
+
-**[Apache Kafka](../../components/data-connectors/kafka)** — for event-streaming topics. Spice consumes records directly with `refresh_mode: append` for real-time, append-only acceleration (no separate CDC connector required).
41
+
-**[Debezium](../../components/data-connectors/debezium)** (over Kafka) — for sources where Debezium + Kafka is already deployed, or for databases without a native Spice CDC path (MySQL, SQL Server, etc.).
0 commit comments