Skip to content

Commit baacee0

Browse files
committed
DuckDB acceleration: document on_refresh_sort_columns
1 parent ca252e6 commit baacee0

2 files changed

Lines changed: 4 additions & 0 deletions

File tree

  • website
    • docs/components/data-accelerators
    • versioned_docs/version-1.11.x/components/data-accelerators

website/docs/components/data-accelerators/duckdb.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ DuckDB acceleration supports the following optional parameters under `accelerati
4242
- `on_refresh_recompute_statistics` (string, default: `enabled`): Triggers automatic `ANALYZE` execution after data refreshes. This keeps DuckDB optimizer statistics up-to-date for efficient query plans and performance. Set to `disabled` to turn automatic statistics recomputation off. See [DuckDB ANALYZE statement documentation](https://duckdb.org/docs/stable/sql/statements/analyze).
4343
- `partition_mode` (string, default: `files`): Controls how partitioned data is stored. Can only be used with `partition_by`. Set to `tables` to store partitions as separate tables within a single DuckDB database, improving resource usage through single shared connection pool for all partitions. Default `files` mode creates separate database files per partition with individual connection pools and generally faster query performance.
4444
- `duckdb_partitioned_write_flush_threshold` (integer, default: `122880`): The number of rows buffered per partition before flushing data to acceleration storage. Only applicable when using `partition_mode: tables`. Using a larger value can improve write performance but requires more memory.
45+
- `on_refresh_sort_columns` (string, default: none): Sorts data after each refresh by the specified columns, improving DuckDB [zone map](https://duckdb.org/2025/05/14/sorting-for-fast-selective-queries) (min/max) statistics for query pruning and significantly faster lookup queries. Format: `column1 ASC, column2 DESC` or `column1, column2` (defaults to ASC). Specified columns must exist in the dataset schema, and sort direction must be `ASC` or `DESC`.
4546
- `optimizer_duckdb_aggregate_pushdown` (string, default: `disabled`): Enables aggregate pushdown optimization to execute supported aggregate queries directly in DuckDB. Set to `enabled` to push down aggregations for improved query performance on supported functions like `count`, `sum`, `avg`, `min`, and `max`. Requires `query_federation` to be `disabled`.
4647

4748
Refer to the [datasets configuration reference](../../reference/spicepod/datasets#acceleration) for additional supported fields.
@@ -69,6 +70,7 @@ Consider the following limitations when using DuckDB acceleration:
6970
- Queries using `on_zero_results: use_source` cannot filter binary columns directly (e.g., `WHERE col_blob <> ''`). Instead, cast binary columns to another type (e.g., `WHERE CAST(col_blob AS TEXT) <> ''`).
7071
- DuckDB indexes currently do not support spilling to disk.
7172
- Hot-reloading dataset configurations while the Spice Runtime is active disables DuckDB query federation until the runtime restarts.
73+
- `on_refresh_sort_columns` is not currently supported with primary keys or indexes.
7274

7375
## Resource Considerations
7476

website/versioned_docs/version-1.11.x/components/data-accelerators/duckdb.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ DuckDB acceleration supports the following optional parameters under `accelerati
4242
- `on_refresh_recompute_statistics` (string, default: `enabled`): Triggers automatic `ANALYZE` execution after data refreshes. This keeps DuckDB optimizer statistics up-to-date for efficient query plans and performance. Set to `disabled` to turn automatic statistics recomputation off. See [DuckDB ANALYZE statement documentation](https://duckdb.org/docs/stable/sql/statements/analyze).
4343
- `partition_mode` (string, default: `files`): Controls how partitioned data is stored. Can only be used with `partition_by`. Set to `tables` to store partitions as separate tables within a single DuckDB database, improving resource usage through single shared connection pool for all partitions. Default `files` mode creates separate database files per partition with individual connection pools and generally faster query performance.
4444
- `duckdb_partitioned_write_flush_threshold` (integer, default: `122880`): The number of rows buffered per partition before flushing data to acceleration storage. Only applicable when using `partition_mode: tables`. Using a larger value can improve write performance but requires more memory.
45+
- `on_refresh_sort_columns` (string, default: none): Sorts data after each refresh by the specified columns, improving DuckDB [zone map](https://duckdb.org/2025/05/14/sorting-for-fast-selective-queries) (min/max) statistics for query pruning and significantly faster lookup queries. Format: `column1 ASC, column2 DESC` or `column1, column2` (defaults to ASC). Specified columns must exist in the dataset schema, and sort direction must be `ASC` or `DESC`.
4546
- `optimizer_duckdb_aggregate_pushdown` (string, default: `disabled`): Enables aggregate pushdown optimization to execute supported aggregate queries directly in DuckDB. Set to `enabled` to push down aggregations for improved query performance on supported functions like `count`, `sum`, `avg`, `min`, and `max`. Requires `query_federation` to be `disabled`.
4647

4748
Refer to the [datasets configuration reference](../../reference/spicepod/datasets#acceleration) for additional supported fields.
@@ -69,6 +70,7 @@ Consider the following limitations when using DuckDB acceleration:
6970
- Queries using `on_zero_results: use_source` cannot filter binary columns directly (e.g., `WHERE col_blob <> ''`). Instead, cast binary columns to another type (e.g., `WHERE CAST(col_blob AS TEXT) <> ''`).
7071
- DuckDB indexes currently do not support spilling to disk.
7172
- Hot-reloading dataset configurations while the Spice Runtime is active disables DuckDB query federation until the runtime restarts.
73+
- `on_refresh_sort_columns` is not currently supported with primary keys or indexes.
7274

7375
## Resource Considerations
7476

0 commit comments

Comments
 (0)