Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -50,14 +50,14 @@ Retry tuning is exposed only on the [Elasticsearch Vector Engine](../../vectors/
| Setting | Default | Behavior |
| --------------- | ------- | ---------------------------------------------------------------------------- |
| Connect timeout | `10s` | Maximum time to establish a TCP/TLS connection to the cluster. |
| Request timeout | `30s` | Maximum time for the entire request/response cycle, including retries. |
| Request timeout | `30s` | Maximum time for each individual HTTP request. |

Long-running search responses (very large `LIMIT`, deep pagination, or expensive aggregations) may exceed the default request timeout. Either narrow the query, accelerate the dataset, or use the [vector engine](../../vectors/elasticsearch) `client_timeout` parameter when running the workload through the embedding-write path.

## Capacity & Sizing

- **Throughput**: Bounded by the Elasticsearch cluster's request handling and (for kNN) HNSW search cost. Plan refresh intervals and concurrent query load to stay within the cluster's tested capacity.
- **Result size**: Each `_search` request returns up to `size` hits. The connector translates `LIMIT` to `size`; very large limits incur higher cluster memory and network cost.
- **Result size**: Each `_search` request returns up to `size` hits, hard-capped at **10,000** (the Elasticsearch default `index.max_result_window`). The connector translates `LIMIT` to `size` but clamps the value to 10,000; queries without `LIMIT` also default to 10,000. For full-index access, accelerate the dataset into a local engine.
- **Mapping fetches**: At dataset registration the connector fetches the index mapping once via `GET /<index>/_mapping`. Mapping changes after registration are not picked up until the runtime restarts.
- **Pagination**: Spice does not currently use Elasticsearch's `search_after` or scroll APIs from the data connector. For full-table scans of very large indexes, prefer accelerating into a local engine.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,8 @@ The connector derives an Arrow schema from each index's mapping via `GET /<index
| `ip` | `Utf8` | |
| `dense_vector` (with `dims`) | `FixedSizeList<Float32, dims>` | Required `dims` field must fit in `i32`. |
| `dense_vector` (missing `dims`) | `Utf8` | Falls back to raw JSON when dims cannot be resolved. |
| `object`, `nested` | `Utf8` | Serialized JSON. |
| `object` (with sub-fields) | _(flattened)_ | Expanded into dot-separated columns (e.g. `address.city`). |
| `object` (no sub-fields), `nested` | `Utf8` | Serialized JSON. |
| Any other mapping type | `Utf8` | Fallback — the raw JSON value is preserved as a string. |

Nested `object` fields are flattened by concatenating field names with dots (e.g. `address.city`). `nested` fields are preserved as JSON strings because per-document ordering must be retained.
Expand Down Expand Up @@ -137,6 +138,7 @@ TLS is enabled automatically for `https://` endpoints.
- Nested object fields are exposed as JSON strings rather than structured columns.
- `date` and `date_nanos` fields are preserved as strings because Elasticsearch accepts heterogeneous date formats; cast to a timestamp in SQL when numeric comparison is required.
- `dense_vector` fields without a declared `dims` value fall back to `Utf8` and are not usable as a vector column.
- Queries return at most **10,000 hits** per scan. The connector translates SQL `LIMIT` to the Elasticsearch `size` parameter, capped at 10,000 (the Elasticsearch default maximum). Queries without `LIMIT` also return at most 10,000 results. For full-index access, accelerate the dataset into a local engine.
- Pushdown of SQL predicates to Elasticsearch query DSL is limited; complex filter expressions are evaluated locally by DataFusion after fetching results.

Elasticsearch can also be configured as a [Vector Engine](../vectors/elasticsearch) for datasets sourced from other connectors (storing Spice-managed embeddings in Elasticsearch rather than querying an existing index).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,8 @@ The PostgreSQL connector exposes observable metrics for its replication pipeline
| `replication_updates_total` | ObservableCounter | Total update operations received. |
| `replication_deletes_total` | ObservableCounter | Total delete operations received. |
| `replication_truncates_total` | ObservableCounter | Total truncate operations received. |
| `replication_bootstrap_rows_total` | ObservableGauge | Total rows fetched during initial bootstrap. |
| `replication_bootstrap_complete` | ObservableCounter | Bootstrap completion status. |
| `replication_bootstrap_rows_total` | ObservableCounter | Total rows fetched during initial bootstrap. |
| `replication_bootstrap_complete` | ObservableGauge | Bootstrap completion status. |
| `replication_decode_errors_total` | ObservableCounter | Total WAL decode errors. |
| `replication_schema_mismatch_errors_total` | ObservableCounter | Total schema mismatch errors during replication. |
| `replication_recv_errors_total` | ObservableCounter | Total receive errors during replication. |
Expand Down
Loading