Skip to content

Commit bf3e4c4

Browse files
authored
Merge branch 'trunk' into jeadie/26-05-07/xai-grok-model-retirement
2 parents 9f63b77 + 21b1f40 commit bf3e4c4

160 files changed

Lines changed: 3970 additions & 460 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

website/docs/components/catalogs/cayenne.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,14 @@ Use the `include` field to specify which tables to include from the catalog. The
3838

3939
## `params`
4040

41-
| Parameter Name | Description | Default |
42-
| ----------------------------- | ----------------------------------------------------- | -------------------- |
43-
| `cayenne_data_dir` | Local directory for table data files (Vortex format). | Spice data directory |
44-
| `cayenne_metadata_dir` | Local directory for Cayenne SQLite metadata. | Spice data directory |
45-
| `cayenne_target_file_size_mb` | Target Vortex file size in MB. | `128` |
41+
| Parameter Name | Description | Default |
42+
| -------------------------------- | -------------------------------------------------------------------------------------- | ------------ |
43+
| `cayenne_data_dir` | Local directory for table data files (Vortex format). | Spice data directory |
44+
| `cayenne_metadata_dir` | Local directory for Cayenne SQLite metadata. | Spice data directory |
45+
| `cayenne_target_file_size_mb` | Target Vortex file size in MB. | `128` |
46+
| `cayenne_footer_cache_mb` | Size of the in-memory Vortex footer cache in MB for query performance. | `128` |
47+
| `cayenne_segment_cache_mb` | Size of the in-memory Vortex segment cache in MB for caching decompressed data. | `256` |
48+
| `cayenne_compression_strategy` | Compression algorithm for Vortex files. Options: `btrblocks`, `zstd`. | `btrblocks` |
4649

4750
## Examples
4851

website/docs/components/catalogs/ducklake.md

Lines changed: 36 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -70,24 +70,52 @@ The `access` field controls what operations are allowed on the catalog:
7070

7171
## `params`
7272

73-
| Parameter Name | Description |
74-
| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
75-
| `ducklake_connection_string` | The DuckLake metadata location (e.g., `s3://bucket/path/metadata.ducklake`). If omitted, the value from `from: ducklake:<connection_string>` is used. |
76-
| `ducklake_name` | The name to attach the DuckLake catalog as in DuckDB. Default: `ducklake`. |
77-
| `ducklake_open` | Path to an existing DuckDB file for persistent storage. If not provided, an in-memory DuckDB instance is used. |
73+
| Parameter Name | Description |
74+
| ---------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
75+
| `ducklake_connection_string` | The DuckLake metadata location (e.g., `s3://bucket/path/metadata.ducklake`). If omitted, the value from `from: ducklake:<connection_string>` is used. |
76+
| `ducklake_name` | The name to attach the DuckLake catalog as in DuckDB. Default: `ducklake`. |
77+
| `ducklake_open` | Path to an existing DuckDB file for persistent storage. If not provided, an in-memory DuckDB instance is used. |
78+
| `ducklake_aws_region` | Optional. The AWS region for S3 storage. Default: `us-east-1` when explicit credentials are provided. |
79+
| `ducklake_aws_access_key_id` | Optional. The AWS access key ID for S3 storage. Must be set together with `ducklake_aws_secret_access_key`. |
80+
| `ducklake_aws_secret_access_key` | Optional. The AWS secret access key for S3 storage. Must be set together with `ducklake_aws_access_key_id`. |
81+
| `ducklake_aws_endpoint` | Optional. Custom S3-compatible endpoint URL (e.g., for MinIO). |
82+
| `ducklake_aws_allow_http` | Optional. Set to `true` to allow HTTP (non-TLS) connections to S3. Default: `false`. |
7883

7984
## Authentication
8085

81-
DuckLake relies on DuckDB's credential resolution for cloud storage access. No Spice-specific authentication parameters are needed.
82-
8386
### AWS S3
8487

85-
Uses the standard AWS credential chain:
88+
When no explicit S3 credentials are configured, DuckDB falls back to its built-in credential chain provider:
8689

8790
1. Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`)
8891
2. Shared credentials file (`~/.aws/credentials`)
8992
3. IAM instance profiles (on EC2/ECS)
9093

94+
To provide explicit S3 credentials, use the `ducklake_aws_*` parameters:
95+
96+
```yaml
97+
catalogs:
98+
- from: ducklake:s3://my-bucket/metadata.ducklake
99+
name: my_lakehouse
100+
params:
101+
ducklake_aws_region: us-west-2
102+
ducklake_aws_access_key_id: ${secrets:AWS_ACCESS_KEY_ID}
103+
ducklake_aws_secret_access_key: ${secrets:AWS_SECRET_ACCESS_KEY}
104+
```
105+
106+
For S3-compatible storage (e.g., MinIO), use `ducklake_aws_endpoint`:
107+
108+
```yaml
109+
catalogs:
110+
- from: ducklake:s3://my-bucket/metadata.ducklake
111+
name: my_lakehouse
112+
params:
113+
ducklake_aws_endpoint: http://minio:9000
114+
ducklake_aws_access_key_id: ${secrets:MINIO_ACCESS_KEY}
115+
ducklake_aws_secret_access_key: ${secrets:MINIO_SECRET_KEY}
116+
ducklake_aws_allow_http: true
117+
```
118+
91119
## Examples
92120

93121
### Local DuckLake catalog

website/docs/components/catalogs/postgres.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ Connection can be configured using a connection string or individual parameters.
5959
| `pg_user` | The PostgreSQL username for authentication. |
6060
| `pg_pass` | The PostgreSQL password for authentication. |
6161
| `pg_sslmode` | The SSL mode for the connection (e.g. `require`, `prefer`, `disable`). |
62-
| `pg_sslrootcert` | Path to the SSL root certificate file. |
62+
| `pg_sslrootcert` | Path to the SSL root certificate file, or inline PEM content. |
6363

6464
## Authentication
6565

website/docs/components/data-accelerators/arrow/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,8 @@ datasets:
4747
acceleration:
4848
engine: arrow
4949
primary_key: order_id
50-
params:
51-
hash_index: enabled
50+
params:
51+
hash_index: enabled
5252
```
5353

5454
See [Hash Index](../../features/data-acceleration/hash-index) for configuration details, supported data types, and performance characteristics.

website/docs/components/data-accelerators/cayenne/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,7 @@ When a query includes a `WHERE` clause, Spice Cayenne evaluates whether each seg
207207
For a table with segments containing timestamp ranges `[2024-01-01, 2024-01-15]`, `[2024-01-16, 2024-01-31]`, `[2024-02-01, 2024-02-15]`, a query:
208208

209209
```sql
210-
SELECT * FROM events WHERE timestamp > '2024-01-20'
210+
SELECT * FROM events WHERE timestamp > '2024-01-20';
211211
```
212212

213213
Prunes the first segment (max < 2024-01-20) and reads only the second and third segments.

website/docs/components/data-accelerators/duckdb/deployment.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,8 @@ DuckDB self-tunes its memory limit based on system memory. For containers, set a
5959

6060
| Parameter | Description |
6161
| ---------------------------- | ------------------------------------------------------------------------------------------------------- |
62-
| `index_scan_percentage` | Optimizer hint: fraction of rows below which index scan is preferred over table scan. |
63-
| `index_scan_max_count` | Optimizer hint: maximum rows for which index scan is preferred. |
62+
| `duckdb_index_scan_percentage` | Optimizer hint: fraction of rows below which index scan is preferred over table scan. |
63+
| `duckdb_index_scan_max_count` | Optimizer hint: maximum rows for which index scan is preferred. |
6464
| `on_refresh_sort_columns` | Columns to sort by during refresh. **Caution**: current implementation uses `CREATE OR REPLACE`, which drops constraints and indexes. |
6565

6666
DuckDB supports traditional B-tree / ART indexes via SQL `CREATE INDEX` against the accelerated table. Define them once the dataset schema is stable.
@@ -94,6 +94,6 @@ DuckDB acceleration operations participate in [task history](../../../reference/
9494
| Slow first startup after restart | WAL replay due to ungraceful shutdown. | Use graceful shutdown (`SIGTERM`). Subsequent starts will be fast once the checkpoint is clean. |
9595
| OOM on refresh | DuckDB memory limit too high for container cgroup. | Set a `memory_limit` pragma via the connection string. |
9696
| Disk fills during large queries | Spill directory on undersized volume. | Point `runtime.query.temp_directory` at a larger volume; monitor free space. |
97-
| Query uses table scan when an index exists | `index_scan_percentage` / `index_scan_max_count` too low. | Tune thresholds; `EXPLAIN` to confirm. |
97+
| Query uses table scan when an index exists | `duckdb_index_scan_percentage` / `duckdb_index_scan_max_count` too low. | Tune thresholds; `EXPLAIN` to confirm. |
9898
| Indexes disappear after refresh | `on_refresh_sort_columns` triggers `CREATE OR REPLACE`. | Re-create indexes post-refresh, or avoid sort-column refreshes until the underlying behavior is updated. |
9999
| `IO Error: Could not set lock on file` | Another process holds a write lock. | Ensure single-writer semantics; verify no other Spice instance is using the same file. |

website/docs/components/data-accelerators/postgres/deployment.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ The accelerator uses the same Postgres wire-protocol authentication as the [Post
2929
| `pg_user` | Postgres user. Must have `CREATE`, `INSERT`, `UPDATE`, `DELETE`, `SELECT` on the target schema. |
3030
| `pg_pass` | Password. Use `${secrets:...}` to resolve from a configured secret store. |
3131
| `pg_sslmode` | TLS mode: `disable` / `prefer` / `require` / `verify-ca` / `verify-full`. |
32-
| `pg_sslrootcert` | CA bundle path for `verify-ca` / `verify-full`. |
32+
| `pg_sslrootcert` | CA bundle file path for `verify-ca` / `verify-full`. |
3333

3434
For production, use `pg_sslmode: verify-full` and source passwords from a [secret store](../../secret-stores/). The accelerator sets `application_name` on each connection to the Spice.ai version, which surfaces in `pg_stat_activity` for attribution.
3535

@@ -43,7 +43,7 @@ The accelerator creates and writes tables in the configured database. Grant the
4343

4444
| Parameter | Default | Description |
4545
| ------------------------- | ------- | ----------------------------------------------------------------------- |
46-
| `connection_pool_min` | `5` | Minimum idle connections held by the pool. |
46+
| `pg_connection_pool_min` | `5` | Minimum idle connections held by the pool. |
4747
| `connection_pool_size` | `10` | Maximum connections the pool will open. |
4848

4949
`connection_pool_min <= connection_pool_size` is enforced at startup; mismatched values are rejected as configuration errors.

website/docs/components/data-accelerators/postgres/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,13 +32,13 @@ The connection to PostgreSQL can be configured by providing the following `param
3232
- `pg_user`: The username to connect with.
3333
- `pg_pass`: The password to connect with. Use the [secret replacement syntax](../../components/secret-stores) to load the password from a secret store, e.g. `${secrets:my_pg_pass}`.
3434
- `pg_sslmode`: Optional. Specifies the SSL/TLS behavior for the connection, supported values:
35-
- `verify-full`: (default) This mode requires an SSL connection, a valid root certificate, and the server host name to match the one specified in the certificate.
35+
- `verify-full`: This mode requires an SSL connection, a valid root certificate, and the server host name to match the one specified in the certificate.
3636
- `verify-ca`: This mode requires a TLS connection and a valid root certificate.
3737
- `require`: This mode requires a TLS connection.
38-
- `prefer`: This mode will try to establish a secure TLS connection if possible, but will connect insecurely if the server does not support TLS.
38+
- `prefer`: (default) This mode will try to establish a secure TLS connection if possible, but will connect insecurely if the server does not support TLS.
3939
- `disable`: This mode will not attempt to use a TLS connection, even if the server supports it.
4040
- `allow`: This mode will try a non-TLS connection first, then retry with TLS if the server requires it.
41-
- `pg_sslrootcert`: Optional parameter specifying the path to a custom PEM certificate that the connector will trust.
41+
- `pg_sslrootcert`: Optional. Path to a custom PEM certificate file that the connector will trust.
4242
- `pg_connection_pool_min`: Optional. The minimum number of connections to keep open in the pool, lazily created when requested. Default is `5`.
4343
- `connection_pool_size`: Optional. The maximum number of connections created in the connection pool. Default is `10`.
4444

0 commit comments

Comments
 (0)