spiceai
diff --git a/‎website/docs/components/data-connectors/imap.md‎
Lines changed: 1 addition & 1 deletion b/‎website/docs/components/data-connectors/imap.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎website/docs/components/data-connectors/mongodb.md‎
Lines changed: 3 additions & 3 deletions b/‎website/docs/components/data-connectors/mongodb.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎website/docs/features/data-acceleration/snapshots.md‎
Lines changed: 12 additions & 1 deletion b/‎website/docs/features/data-acceleration/snapshots.md‎
Lines changed: 12 additions & 1 deletion
diff --git a/‎website/docs/features/observability/index.md‎
Lines changed: 4 additions & 0 deletions b/‎website/docs/features/observability/index.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎website/docs/features/search/vector-search.md‎
Lines changed: 2 additions & 0 deletions b/‎website/docs/features/search/vector-search.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎website/docs/monitoring/datadog/index.md‎
Lines changed: 4 additions & 0 deletions b/‎website/docs/monitoring/datadog/index.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎website/docs/reference/spicepod/index.md‎
Lines changed: 2 additions & 2 deletions b/‎website/docs/reference/spicepod/index.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎website/docs/reference/spicepod/runtime.md‎
Lines changed: 4 additions & 0 deletions b/‎website/docs/reference/spicepod/runtime.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎website/docs/reference/sql/json.md‎
Lines changed: 118 additions & 0 deletions b/‎website/docs/reference/sql/json.md‎
Lines changed: 118 additions & 0 deletions
@@ -100,7 +100,7 @@ The IMAP connector supports the following connection and authentication paramete
 | `imap_username` | Optional. The username to use for the IMAP connection. Defaults to the value of the `from:` mailbox field.                   |
 | `imap_password` | Optional. The password to use for the IMAP connection, in plaintext authentication mode.                                     |
 | `imap_host`     | Optional. The host or IP address of the IMAP server to connect to. Not required for known connections like Outlook or Gmail. |
-| `imap_port`     | Optional. The port of the IMAP server to connect to.                                                                         |
+| `imap_port`     | Optional. The port of the IMAP server to connect to. Defaults to `993`.                                                                         |
 | `imap_mailbox`  | Optional. The mailbox to read mail from. Defaults to `INBOX`, the standard email inbox.                                      |
 | `imap_ssl_mode` | Optional. The IMAP SSL mode to use. Defaults to `auto`, permitted values of `tls`, `starttls`, `disabled` or `auto`.          |
 
 
@@ -80,9 +80,9 @@ The MongoDB data connector can be configured by providing the following `params`
 | `mongodb_connection_string`        | The connection string to use to connect to the MongoDB server. This can be used instead of providing individual connection parameters.                                                                                                                                                                                                                                                                                                                                                                       |
 | `mongodb_user`                     | The MongoDB username.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
 | `mongodb_pass`                     | The password to connect with.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
-| `mongodb_host`                     | The hostname of the MongoDB server.                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
-| `mongodb_port`                     | The port of the MongoDB server.                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
-| `mongodb_db`                       | The name of the database to connect to.                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+| `mongodb_host`                     | The hostname of the MongoDB server. Defaults to `localhost`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+| `mongodb_port`                     | The port of the MongoDB server. Defaults to `27017`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+| `mongodb_db`                       | The name of the database to connect to. Defaults to `default`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
 | `mongodb_sslmode`                  | Optional. Specifies the SSL/TLS behavior for the connection, supported values:<br /> <ul><li>`required`: (default) This mode requires an SSL connection. If a secure connection cannot be established, server will not connect.</li><li>`preferred`: This mode will try to establish a secure SSL connection if possible, but will connect insecurely if the server does not support SSL.</li><li>`disabled`: This mode will not attempt to use an SSL connection, even if the server supports it.</li></ul> |
 | `mongodb_sslrootcert`              | Optional parameter specifying the path to a custom PEM certificate that the connector will trust.                                                                                                                                                                                                                                                                                                                                                                                                            |
 | `mongodb_time_zone`                | Optional. Specifies connection time zone. Default is `UTC`. Accepts: <br /><ul><li>Fixed offsets (e.g., `+02:00`).</li><li>IANA time zone names (e.g., `America/Los_Angeles`)</li></ul>                                                                                                                                                                                                                                                                                                                      |
 
@@ -51,7 +51,7 @@ Every accelerated dataset must write to its own file (for example, `/nvme/my_dat
 
 ## Configure snapshot storage
 
-Snapshots are controlled with a top-level `snapshots` block in the Spicepod. The location must point to a folder on S3 or the local filesystem. When the location is an S3 bucket, the configuration accepts any S3 dataset parameters under `params`.
+Snapshots are controlled with a top-level `snapshots` block in the Spicepod. The location can point to S3, Azure ADLS Gen2, Google Cloud Storage, or the local filesystem.
 
 ```yaml
 snapshots:
@@ -62,6 +62,17 @@ snapshots:
     s3_auth: iam_role                     # Defaults to iam_role for snapshots
 ```
 
+### Supported storage backends
+
+| Backend | URL scheme | Environment variables |
+| --- | --- | --- |
+| Amazon S3 | `s3://` | Standard AWS credentials (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, etc.) |
+| Azure ADLS Gen2 | `abfss://`, `abfs://` | `AZURE_STORAGE_ACCOUNT_NAME`, `AZURE_STORAGE_ACCOUNT_KEY`, `AZURE_CLIENT_ID`/`AZURE_TENANT_ID`/`AZURE_FEDERATED_TOKEN_FILE` |
+| Google Cloud Storage | `gs://` | `GOOGLE_APPLICATION_CREDENTIALS`, Workload Identity |
+| Local filesystem | Absolute or relative path | N/A |
+
+When the location is an S3 bucket, the configuration accepts any [S3 dataset parameters](../../components/data-connectors/s3) under `params`. Azure and GCS locations also accept their respective connector parameters under `params` for explicit credential overrides. When no explicit credentials are supplied, Spice reads standard environment variables for each cloud provider.
+
 ### Failure behavior
 
 `bootstrap_on_failure_behavior` controls what Spice does when it cannot load the most recent snapshot.
 
@@ -239,6 +239,10 @@ runtime:
 
 When `metrics` is empty or omitted, all available metrics are exported.
 
+:::caution Filtering happens after `metric_prefix` is applied
+The whitelist is matched against the **final** metric name, after `runtime.telemetry.metric_prefix` has been prepended. If you set `metric_prefix: 'spiceai.'`, the entries under `metrics:` must include the prefix (e.g. `spiceai.query_duration_ms`), otherwise nothing will match and no metrics will be exported.
+:::
+
 For full configuration details, see the [runtime.telemetry reference](../reference/spicepod/runtime#runtimetelemetry).
 
 ## Available Metrics
 
@@ -157,6 +157,8 @@ FROM vector_search('sales', 'cutting edge AI', 1500)
 ORDER BY score DESC;
 ```
 
+`WHERE` predicates on base table columns are pushed down as pre-filters — only matching rows are scored and ranked. See [Search in SQL](../../reference/sql/search#vector-search-vector_search) for details.
+
 :::warning[Limitations]
 
 - `vector_search` UDTF does not yet support chunked embedding columns. Chunking support is on the roadmap.
 
@@ -75,6 +75,10 @@ runtime:
 
 The runtime metric `query_duration_ms` is then exported as `spiceai.query_duration_ms`.
 
+:::caution Combining `metric_prefix` with metric filtering
+If you also set [`runtime.telemetry.otel_exporter.metrics`](/docs/next/reference/spicepod/runtime#runtimetelemetryotel_exporter) to whitelist specific metrics, the entries must include the prefix. The filter runs after the prefix is applied, so e.g. `query_duration_ms` will not match when `metric_prefix: 'spiceai.'` is set — use `spiceai.query_duration_ms` instead.
+:::
+
 ### Add Custom Tags via Resource Attributes
 
 Attach custom key/value pairs to every metric using [`runtime.telemetry.properties`](/docs/next/reference/spicepod/runtime#runtimetelemetryproperties). Spice sends these as OpenTelemetry resource attributes:
 
@@ -135,7 +135,7 @@ Enable or disable snapshot management globally. Defaults to `true`.
 
 ### `snapshots.location`
 
-The folder where snapshots are stored. Supports S3 bucket URIs (`s3://bucket/prefix/`) and absolute or relative filesystem paths. The path must resolve to a single folder; Spice creates per-dataset folders underneath using Hive-style partitions (`month=YYYY-MM/day=YYYY-MM-DD/dataset=<name>`).
+The folder where snapshots are stored. Supports S3 bucket URIs (`s3://bucket/prefix/`), Azure ADLS Gen2 URIs (`abfss://container@account.dfs.core.windows.net/path/`), Google Cloud Storage URIs (`gs://bucket/prefix/`), and absolute or relative filesystem paths. The path must resolve to a single folder; Spice creates per-dataset folders underneath using Hive-style partitions (`month=YYYY-MM/day=YYYY-MM-DD/dataset=<name>`).
 
 ### `snapshots.bootstrap_on_failure_behavior`
 
@@ -147,7 +147,7 @@ Controls what happens when Spice cannot load the most recent snapshot on startup
 
 ### `snapshots.params`
 
-Optional key-value map passed to the snapshot storage layer. When `location` points to S3, the configuration accepts any of the [S3 dataset parameters](../components/data-connectors/s3). Snapshots default to `s3_auth: iam_role`, which differs from the S3 dataset default of `public`.
+Optional key-value map passed to the snapshot storage layer. When `location` points to S3, the configuration accepts any of the [S3 dataset parameters](../components/data-connectors/s3). Snapshots default to `s3_auth: iam_role`, which differs from the S3 dataset default of `public`. Azure ADLS and GCS locations also accept their respective connector parameters for explicit credential overrides; when no overrides are supplied, Spice reads standard environment variables for each cloud provider.
 
 ## `models`
 
 
@@ -435,6 +435,10 @@ runtime:
         - dataset_load_state
 ```
 
+:::caution Filtering happens after `metric_prefix` is applied
+The whitelist is matched against the **final** metric name, after [`runtime.telemetry.metric_prefix`](#runtimetelemetrymetric_prefix) has been prepended. If you set `metric_prefix: 'spiceai.'`, the entries under `metrics:` must include the prefix (e.g. `spiceai.query_duration_ms`), otherwise nothing will match and no metrics will be exported.
+:::
+
 **Authenticated exporters:**
 
 For collectors that require authentication, set the `headers` map. Load credentials from a [secret store](../../components/secret-stores) via `${secrets:...}` rather than committing them to source.
 
@@ -477,6 +477,124 @@ SELECT
 FROM products;
 ```
 
+## JSON Table Functions (UDTFs)
+
+Spice includes table-valued functions for decomposing JSON structures into relational rows. Each function is available as both a UDTF (in the `FROM` clause with literal input) and a scalar UDF returning a list of structs (for per-row use with `UNNEST`).
+
+### `flatten_json`
+
+Walks an arbitrary JSON value and emits one row per reachable leaf.
+
+```sql
+flatten_json(input Utf8 [, options...]) -> TABLE(
+    path         Utf8,
+    parent_path  Utf8,
+    key          Utf8,
+    value        Utf8,
+    type         Utf8     -- "object"|"array"|"string"|"number"|"integer"|"boolean"|"null"
+)
+```
+
+**Options (named arguments):**
+
+| Option | Type | Default | Description |
+| --- | --- | --- | --- |
+| `max_depth` | UInt | `64` | Maximum recursion depth. |
+| `max_rows` | UInt | `1000000` | Per-document row cap. |
+| `max_bytes` | UInt | `8388608` | Input size limit (bytes). |
+| `path_style` | Utf8 | `"dot"` | `"dot"` or `"json-pointer"`. |
+| `include_internal` | Bool | `false` | Also emit interior object/array rows. |
+| `array_wildcard` | Bool | `false` | Collapse array indices to `[*]` instead of `[0]`, `[1]`, etc. |
+
+**UDTF example:**
+
+```sql
+SELECT path, value, type
+FROM flatten_json('{"user": {"name": "Alice", "scores": [95, 87]}}');
+```
+
+| path | value | type |
+| --- | --- | --- |
+| `user.name` | `Alice` | `string` |
+| `user.scores[0]` | `95` | `integer` |
+| `user.scores[1]` | `87` | `integer` |
+
+**Scalar UDF example (per-row with `UNNEST`):**
+
+```sql
+SELECT rows.path, rows.value, rows.type
+FROM (SELECT UNNEST(flatten_json(body)) AS rows FROM documents);
+```
+
+### `flatten_json_properties`
+
+Decomposes a JSON Schema document into one row per field, extracting metadata such as types, descriptions, required status, enums, and format.
+
+```sql
+flatten_json_properties(input Utf8 [, options...]) -> TABLE(
+    path         Utf8,
+    parent_path  Utf8,
+    name         Utf8,
+    description  Utf8,
+    type         Utf8,
+    required     Boolean,
+    format       Utf8,
+    enum_values  List<Utf8>,
+    metadata     Utf8
+)
+```
+
+Handles `properties` recursion, `items.properties` (arrays of objects), `additionalProperties` maps, `allOf`/`oneOf`/`anyOf` merging, and local `$ref` pointers with cycle detection.
+
+**Options (named arguments):**
+
+| Option | Type | Default | Description |
+| --- | --- | --- | --- |
+| `max_depth` | UInt | `32` | Maximum recursion depth. |
+| `max_rows` | UInt | `100000` | Per-document row cap. |
+| `max_bytes` | UInt | `8388608` | Input size limit (bytes). |
+| `path_style` | Utf8 | `"dot"` | `"dot"` or `"json-pointer"`. |
+| `dialect` | Utf8 | `"json-schema"` | `"json-schema"` or `"openapi"` (metrics tagging). |
+| `include_internal` | Bool | `false` | Also emit container rows (objects, arrays). |
+| `expand_maps` | Bool | `false` | Walk into `additionalProperties` and emit child paths with a wildcard segment (e.g., `parent.[*].child`). |
+| `map_wildcard` | Utf8 | `"[*]"` | Wildcard segment for map values when `expand_maps` is `true`. |
+
+**Example:**
+
+```sql
+SELECT path, type, required, description
+FROM flatten_json_properties('{
+  "type": "object",
+  "properties": {
+    "name": {"type": "string", "description": "User name"},
+    "age": {"type": "integer"}
+  },
+  "required": ["name"]
+}');
+```
+
+| path | type | required | description |
+| --- | --- | --- | --- |
+| `name` | `string` | `true` | `User name` |
+| `age` | `integer` | `false` | |
+
+**Expanding maps:**
+
+When a JSON Schema uses `additionalProperties` to describe map values, enable `expand_maps` to produce JSONPath-style paths:
+
+```sql
+SELECT path, type
+FROM flatten_json_properties(
+  '{"type": "object", "additionalProperties": {"type": "object", "properties": {"id": {"type": "string"}, "primary": {"type": "boolean"}}}}',
+  expand_maps => true
+);
+```
+
+| path | type |
+| --- | --- |
+| `[*].id` | `string` |
+| `[*].primary` | `boolean` |
+
 ## Further Reading
 
 - [datafusion-functions-json](https://github.com/datafusion-contrib/datafusion-functions-json) - The underlying JSON manipulation library