spiceai
diff --git a/‎website/docs/components/catalogs/unity-catalog/deployment.md‎
Lines changed: 128 additions & 0 deletions b/‎website/docs/components/catalogs/unity-catalog/deployment.md‎
Lines changed: 128 additions & 0 deletions
diff --git a/‎…ocs/components/catalogs/unity-catalog.md‎ ‎…mponents/catalogs/unity-catalog/index.md‎website/docs/components/catalogs/unity-catalog.md renamed to website/docs/components/catalogs/unity-catalog/index.md b/‎…ocs/components/catalogs/unity-catalog.md‎ ‎…mponents/catalogs/unity-catalog/index.md‎website/docs/components/catalogs/unity-catalog.md renamed to website/docs/components/catalogs/unity-catalog/index.md
diff --git a/‎website/docs/components/data-accelerators/arrow/deployment.md‎
Lines changed: 69 additions & 0 deletions b/‎website/docs/components/data-accelerators/arrow/deployment.md‎
Lines changed: 69 additions & 0 deletions
diff --git a/‎…cs/components/data-accelerators/arrow.md‎ ‎…ponents/data-accelerators/arrow/index.md‎website/docs/components/data-accelerators/arrow.md renamed to website/docs/components/data-accelerators/arrow/index.md
Lines changed: 1 addition & 1 deletion b/‎…cs/components/data-accelerators/arrow.md‎ ‎…ponents/data-accelerators/arrow/index.md‎website/docs/components/data-accelerators/arrow.md renamed to website/docs/components/data-accelerators/arrow/index.md
Lines changed: 1 addition & 1 deletion
@@ -0,0 +1,128 @@
+---
+title: 'Unity Catalog Catalog Connector Deployment Guide'
+sidebar_label: 'Deployment Guide'
+description: 'Operating guide for the Unity Catalog catalog connector in production: workspace authentication, table-type filtering, effective-permissions flow, and observability.'
+sidebar_position: 10
+pagination_prev: null
+pagination_next: null
+tags:
+  - catalogs
+  - unity-catalog
+  - observability
+---
+
+Production operating guide for the Unity Catalog catalog connector — discovering Databricks Unity Catalog tables and federating them through Spice.
+
+For Databricks-specific operational concerns (SQL Warehouse resilience, metrics, permissions flow as applied to Databricks workspaces), see the [Databricks Deployment Guide](../../data-connectors/databricks/deployment) — the Unity Catalog logic described there applies directly when the catalog connector targets a Databricks workspace.
+
+## Authentication & Secrets
+
+| Parameter              | Description                                                                          |
+| ---------------------- | ------------------------------------------------------------------------------------ |
+| `unity_catalog_token`  | Bearer token for the Unity Catalog API. Use `${secrets:...}` from a secret store.     |
+
+The catalog URL must match the pattern `https://<host>/api/2.1/unity-catalog/catalogs/<catalog_id>` and is parsed into the endpoint and catalog identifier at startup. Mismatched URLs are rejected as configuration errors.
+
+The token is optional — when unset, the catalog connector issues unauthenticated requests, suitable for locally-hosted Unity Catalog deployments (OSS UC) with permissive access. For Databricks workspaces, the token is always required.
+
+Secrets must be sourced from a [secret store](../../secret-stores/) in production. Rotate tokens from the UC / Databricks console and update the secret store.
+
+## Resilience Controls
+
+### HTTP Retry Policy
+
+The Unity Catalog client uses the shared `resilient_http` helper with these defaults:
+
+- Maximum retries: **3**
+- Backoff: fibonacci
+- Retriable conditions: HTTP `408`, `429`, `5xx`, and transient network errors (connect, timeout)
+- Respects `Retry-After`, `retry-after-ms`, `x-retry-after-ms` headers
+- Maximum backoff: 300 seconds
+
+These are not exposed as user-tunable parameters on the Unity Catalog connector itself.
+
+### Discovery Concurrency
+
+The connector fans out schema and table enumeration with bounded concurrency to avoid thundering-herd on the UC API:
+
+- Schema refresh: up to **5** concurrent requests (`buffer_unordered(5)`)
+- Permission checks: up to **5** concurrent requests (`buffer_unordered(5)`)
+
+For catalogs with thousands of tables, initial discovery can take minutes while the connector respects these limits.
+
+## Table Type and Permission Handling
+
+### Table Type Filtering
+
+| Table Type          | Supported | Notes                                  |
+| ------------------- | --------- | -------------------------------------- |
+| `MANAGED`           | Yes       | Standard Delta tables                  |
+| `EXTERNAL`          | Yes       | Tables with external storage locations |
+| `FOREIGN`           | Yes       | Lakehouse Federation foreign tables    |
+| `MATERIALIZED_VIEW` | Yes       | Materialized views                     |
+| `VIEW`              | No        | Skipped during discovery               |
+| `STREAMING_TABLE`   | No        | Skipped during discovery               |
+
+Unsupported table types are skipped during catalog discovery. When referenced directly, an error is returned.
+
+### Effective Permissions
+
+Before creating a table provider, the connector checks permissions via `GET /api/2.1/unity-catalog/effective-permissions/table/{catalog.schema.table}`. The following privileges grant read access:
+
+- `SELECT`
+- `ALL_PRIVILEGES` / `ALL PRIVILEGES`
+- `OWNER` / `OWNERSHIP`
+
+**Behavior**:
+
+- **Discovery**: Tables without read permission are skipped.
+- **Direct reference**: An `InsufficientPermissions` error is returned.
+- **Foreign tables**: The precheck is skipped (`requires_read_permission_validation = false`) because Lakehouse Federation access can be valid when the UC effective-permissions endpoint does not report a table-level privilege. Access is still enforced by Databricks at query time.
+- **Graceful degradation**: If the UC API is unreachable or returns an error for the permissions endpoint, discovery proceeds with a warning — table providers are still created, and any per-query authorization failures surface at query time.
+
+## Capacity & Sizing
+
+- **Initial discovery**: Scales with the number of schemas × tables. Bounded concurrency caps throughput; plan 5–30 minutes for catalogs with thousands of tables on a cold start.
+- **Refresh**: Catalog refresh re-enumerates schemas and tables at the configured interval. For very large catalogs, refresh less frequently (every few hours) unless schemas change rapidly.
+- **Permission-check cost**: One API call per table. The buffer of 5 caps concurrency.
+
+## Metrics
+
+The Unity Catalog connector does not currently register UC-specific OpenTelemetry metric instruments. When used via the Databricks connector, the shared SQL Warehouse and UC spans produce task-history records that can be aggregated for operational insight.
+
+Monitor via:
+
+- Spice query execution metrics (`query_duration_ms`, `query_processed_rows`) from `runtime.metrics`.
+- Task-history spans listed below.
+- Databricks / UC workspace audit logs for API-level visibility.
+
+See [Component Metrics](../../../features/observability/component_metrics) for general configuration.
+
+## Task History
+
+Unity Catalog operations emit the following [task history](../../../reference/task_history) spans:
+
+| Span                           | Input                         | Description                              |
+| ------------------------------ | ----------------------------- | ---------------------------------------- |
+| `uc_get_table`                 | Fully-qualified table name    | Fetch table metadata from Unity Catalog. |
+| `uc_get_catalog`               | Catalog ID                    | Fetch catalog metadata.                  |
+| `uc_list_schemas`              | Catalog ID                    | List schemas in a catalog.               |
+| `uc_list_tables`               | `catalog_id.schema_name`      | List tables in a schema.                 |
+| `uc_get_effective_permissions` | Fully-qualified table name    | Check effective permissions for a table. |
+
+## Known Limitations
+
+- **VIEW and STREAMING_TABLE are skipped**: Only queryable table types are exposed.
+- **No UC write-back**: The connector is read-only; writes to UC are not supported through Spice.
+- **HTTP retry/concurrency parameters not exposed**: The resilient-HTTP defaults (3 retries, fibonacci backoff, concurrency 5) are not currently user-tunable on the UC connector.
+- **Graceful degradation on permission-endpoint failures**: If UC effective-permissions is unreachable, Spice proceeds; authorization errors surface at query time rather than discovery time.
+
+## Troubleshooting
+
+| Symptom                                                                 | Likely cause                                                       | Resolution                                                                                                                  |
+| ----------------------------------------------------------------------- | ------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------- |
+| `401 Unauthorized` on catalog list                                      | Missing, expired, or wrong-workspace token.                        | Regenerate token in UC / Databricks; update secret store.                                                                   |
+| Table visible in UC but missing from the Spice catalog                  | Table type is VIEW / STREAMING_TABLE or permissions were denied.   | Confirm table type is supported and that the principal has `SELECT` (or equivalent).                                        |
+| `InsufficientPermissions` on direct table reference                     | Role lacks read privilege on the table.                            | Grant `SELECT` on the table in UC.                                                                                          |
+| Slow catalog discovery on thousands of tables                           | Bounded concurrency + permission checks per table.                 | Expected behavior; schedule discovery during low-traffic windows and cache via accelerated datasets.                         |
+| Tables from a Lakehouse Federation source missing                       | FOREIGN precheck passed but Databricks denied at query time.       | Verify the Databricks workspace has federation privileges granted to the principal.                                          |
@@ -0,0 +1,69 @@
+---
+title: 'Arrow Data Accelerator Deployment Guide'
+sidebar_label: 'Deployment Guide'
+description: 'Operating guide for the Arrow (in-memory) data accelerator in production: memory sizing, indexes, and observability.'
+sidebar_position: 10
+pagination_prev: null
+pagination_next: null
+tags:
+  - data-accelerators
+  - arrow
+  - observability
+---
+
+Production operating guide for the Arrow in-memory data accelerator covering memory sizing, optional hash indexes, and observability.
+
+## Authentication & Secrets
+
+The Arrow accelerator is an in-process, in-memory engine. There is no external storage and no authentication or secret management required.
+
+## Resilience & Durability
+
+The Arrow accelerator is **not durable**. Data is held in RAM and is lost on process restart; every restart re-materializes the dataset from the source connector.
+
+- **Crash recovery**: None — on restart, the dataset is refreshed from scratch.
+- **File modes**: File-mode acceleration is rejected at startup; Arrow is memory-only. Use [DuckDB](../duckdb/deployment), [SQLite](../sqlite/deployment), [PostgreSQL](../postgres/deployment), or [Cayenne](../cayenne/deployment) when durability or spill is required.
+- **Concurrency**: Arrow reads are lock-free. Refresh cadence is controlled by the runtime refresh semaphore, not by the accelerator itself.
+
+## Capacity & Sizing
+
+- **Memory**: Plan for 1.0–1.5× the raw row-oriented size of the source data, plus overhead for string dictionaries. Use the source connector's schema and row count to estimate.
+- **Hash index**: Optional, disabled by default. When enabled via `hash_index: enabled`, a hash map is built over the primary-key columns. Build time scales linearly with rows; memory overhead is approximately 24–48 bytes per row plus the key size.
+- **Startup cost**: Full-dataset materialization happens on startup. For tables larger than ~1 GB, consider a durable accelerator to avoid repeated full refresh on every restart.
+
+## Metrics
+
+Generic acceleration metrics are available with the `dataset_acceleration_` prefix. Hash-index operations emit dedicated metrics when the index is enabled:
+
+| Metric                             | Type      | Description                                               |
+| ---------------------------------- | --------- | --------------------------------------------------------- |
+| `hash_index_builds`                | Counter   | Total hash-index builds (one per refresh).                |
+| `hash_index_build_duration_ms`     | Histogram | Time to build the hash index.                             |
+| `hash_index_entries`               | Gauge     | Number of entries in the index.                           |
+| `hash_index_memory_bytes`          | Gauge     | Approximate memory footprint of the index.                |
+| `hash_index_lookups`               | Counter   | Total hash-index lookups performed by queries.            |
+| `hash_index_lookup_rows`           | Counter   | Total rows returned via hash-index lookups.               |
+
+See [Component Metrics](../../../features/observability/component_metrics) for enabling and exporting metrics. Refresh metrics are described in [Acceleration](../../../features/data-acceleration/).
+
+## Task History
+
+Arrow acceleration operations (refresh, query) participate in [task history](../../../reference/task_history) through the shared acceleration spans (`accelerated_table_refresh`, `sql_query`). No Arrow-specific spans are emitted — the accelerator is a thin wrapper over Arrow memory.
+
+## Known Limitations
+
+- **No persistence**: Every restart refreshes from the source.
+- **No traditional indexes**: Arrow does not support B-tree indexes. Hash index provides point-lookup acceleration but not range or sort-order optimization.
+- **Only primary-key hash index**: The hash index requires a `primary_key` constraint; `unique` constraints alone do not enable the index.
+- **Memory pressure**: If the dataset exceeds available RAM, the runtime will OOM; no spill-to-disk mechanism exists in the Arrow accelerator itself.
+- **`partition_by`**: Not applicable — Arrow accelerator holds a single in-memory representation.
+
+## Troubleshooting
+
+| Symptom                                          | Likely cause                                            | Resolution                                                                                     |
+| ------------------------------------------------ | ------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
+| OOM on refresh                                   | Source dataset larger than RAM.                         | Switch to a durable accelerator (DuckDB / SQLite / Cayenne) that supports spill to disk.       |
+| Long startup time                                | Full-dataset refresh runs on boot.                      | Switch to a durable accelerator so refresh is incremental, not full, on restart.               |
+| `hash_index` ignored                             | No primary-key constraint on the dataset.               | Add `primary_key:` to the dataset definition; hash index activates automatically.              |
+| Query slow for point lookups                     | Hash index disabled or wrong key column.                | Enable `hash_index: enabled`; ensure the query filter matches the primary-key columns.          |
+| Accelerator refuses to start with file mode      | Arrow rejects file-mode acceleration.                   | Switch `engine:` to `duckdb`, `sqlite`, `postgres`, or `cayenne`.                              |
@@ -65,7 +65,7 @@ See [Hash Index](../../features/data-acceleration/hash-index) for configuration
 
 When accelerating a dataset using the In-Memory Arrow Data Accelerator, some or all of the dataset is loaded into memory. Ensure sufficient memory is available, including overhead for queries and the runtime, especially with concurrent queries.
 
-In-memory limitations can be mitigated by storing acceleration data on disk, which is supported by [`duckdb`](./duckdb) and [`sqlite`](./sqlite) accelerators by specifying `mode: file`.
+In-memory limitations can be mitigated by storing acceleration data on disk, which is supported by [`duckdb`](duckdb) and [`sqlite`](sqlite) accelerators by specifying `mode: file`.
 
 :::