Skip to content

Commit cee152e

Browse files
committed
fix: Document conditional default for on_refresh_recompute_statistics in DuckDB accelerator
When using `refresh_mode: changes`, the runtime silently overrides the default for `on_refresh_recompute_statistics` from `enabled` to `disabled`. Update the parameter description across all versioned docs (vNext, 1.9.x, 1.10.x, 1.11.x) to reflect this conditional default so users are aware statistics recomputation is off by default in CDC mode.
1 parent 8a10283 commit cee152e

4 files changed

Lines changed: 4 additions & 4 deletions

File tree

  • website
    • docs/components/data-accelerators/duckdb
    • versioned_docs
      • version-1.10.x/components/data-accelerators
      • version-1.11.x/components/data-accelerators
      • version-1.9.x/components/data-accelerators

website/docs/components/data-accelerators/duckdb/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ DuckDB acceleration supports the following optional parameters under `accelerati
3939
- `duckdb_memory_limit` (string, default: none): Limits DuckDB's memory usage for instance. Acceptable units are KB, MB, GB, TB (decimal: 1000^i) or KiB, MiB, GiB, TiB (binary: 1024^i). See [DuckDB memory limit documentation](https://duckdb.org/docs/stable/configuration/overview).
4040
- `duckdb_preserve_insertion_order` (boolean, default: `true`): Controls whether DuckDB preserves the insertion order of rows in tables. When set to `true`, rows are returned in the order they were inserted. See [DuckDB preserve insertion order documentation](https://duckdb.org/docs/stable/guides/performance/how_to_tune_workloads#the-preserve_insertion_order-option) and [order preservation documentation](https://duckdb.org/docs/stable/sql/dialect/order_preservation).
4141
- `connection_pool_size` (integer, default: `10` or the number of datasets sharing the same DuckDB file, whichever is larger): Controls the maximum number of connections to keep open in the connection pool for concurrent query execution.
42-
- `on_refresh_recompute_statistics` (string, default: `enabled`): Triggers automatic `ANALYZE` execution after data refreshes. This keeps DuckDB optimizer statistics up-to-date for efficient query plans and performance. Set to `disabled` to turn automatic statistics recomputation off. See [DuckDB ANALYZE statement documentation](https://duckdb.org/docs/stable/sql/statements/analyze).
42+
- `on_refresh_recompute_statistics` (string, default: `enabled`, `disabled` when `refresh_mode` is `changes`): Triggers automatic `ANALYZE` execution after data refreshes. This keeps DuckDB optimizer statistics up-to-date for efficient query plans and performance. Set to `disabled` to turn automatic statistics recomputation off. See [DuckDB ANALYZE statement documentation](https://duckdb.org/docs/stable/sql/statements/analyze).
4343
- `duckdb_index_scan_percentage` (float, default: `0.001`): Sets the threshold percentage for performing an index scan instead of a table scan. An index scan is used when the number of matching rows is less than the maximum of `duckdb_index_scan_max_count` and `duckdb_index_scan_percentage` multiplied by total row count. Must be between `0.0` and `1.0`.
4444
- `duckdb_index_scan_max_count` (integer, default: `2048`): Sets the maximum row count threshold for performing an index scan instead of a table scan. An index scan is used when the number of matching rows is less than the maximum of `duckdb_index_scan_max_count` and `duckdb_index_scan_percentage` multiplied by total row count. Must be a non-negative integer.
4545
- `partition_mode` (string, default: `files`): Controls how partitioned data is stored. Can only be used with `partition_by`. Set to `tables` to store partitions as separate tables within a single DuckDB database, improving resource usage through single shared connection pool for all partitions. Default `files` mode creates separate database files per partition with individual connection pools and generally faster query performance.

website/versioned_docs/version-1.10.x/components/data-accelerators/duckdb.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ DuckDB acceleration supports the following optional parameters under `accelerati
3939
- `duckdb_memory_limit` (string, default: none): Limits DuckDB's memory usage for instance. Acceptable units are KB, MB, GB, TB (decimal: 1000^i) or KiB, MiB, GiB, TiB (binary: 1024^i). See [DuckDB memory limit documentation](https://duckdb.org/docs/stable/configuration/overview).
4040
- `duckdb_preserve_insertion_order` (boolean, default: `true`): Controls whether DuckDB preserves the insertion order of rows in tables. When set to `true`, rows are returned in the order they were inserted. See [DuckDB preserve insertion order documentation](https://duckdb.org/docs/stable/guides/performance/how_to_tune_workloads#the-preserve_insertion_order-option) and [order preservation documentation](https://duckdb.org/docs/stable/sql/dialect/order_preservation).
4141
- `connection_pool_size` (integer, default: `10` or the number of datasets sharing the same DuckDB file, whichever is larger): Controls the maximum number of connections to keep open in the connection pool for concurrent query execution.
42-
- `on_refresh_recompute_statistics` (string, default: `enabled`): Triggers automatic `ANALYZE` execution after data refreshes. This keeps DuckDB optimizer statistics up-to-date for efficient query plans and performance. Set to `disabled` to turn automatic statistics recomputation off. See [DuckDB ANALYZE statement documentation](https://duckdb.org/docs/stable/sql/statements/analyze).
42+
- `on_refresh_recompute_statistics` (string, default: `enabled`, `disabled` when `refresh_mode` is `changes`): Triggers automatic `ANALYZE` execution after data refreshes. This keeps DuckDB optimizer statistics up-to-date for efficient query plans and performance. Set to `disabled` to turn automatic statistics recomputation off. See [DuckDB ANALYZE statement documentation](https://duckdb.org/docs/stable/sql/statements/analyze).
4343
- `partition_mode` (string, default: `files`): Controls how partitioned data is stored. Can only be used with `partition_by`. Set to `tables` to store partitions as separate tables within a single DuckDB database, improving resource usage through single shared connection pool for all partitions. Default `files` mode creates separate database files per partition with individual connection pools and generally faster query performance.
4444
- `duckdb_partitioned_write_flush_threshold_rows` (integer, default: `122880`): The number of rows buffered per partition before flushing data to acceleration storage. Only applicable when using `partition_mode: tables`. Using a larger value can improve write performance but requires more memory.
4545
- `optimizer_duckdb_aggregate_pushdown` (string, default: `disabled`): Enables aggregate pushdown optimization to execute supported aggregate queries directly in DuckDB. Set to `enabled` to push down aggregations for improved query performance on supported functions like `count`, `sum`, `avg`, `min`, and `max`. Requires `query_federation` to be `disabled`.

website/versioned_docs/version-1.11.x/components/data-accelerators/duckdb.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ DuckDB acceleration supports the following optional parameters under `accelerati
3939
- `duckdb_memory_limit` (string, default: none): Limits DuckDB's memory usage for instance. Acceptable units are KB, MB, GB, TB (decimal: 1000^i) or KiB, MiB, GiB, TiB (binary: 1024^i). See [DuckDB memory limit documentation](https://duckdb.org/docs/stable/configuration/overview).
4040
- `duckdb_preserve_insertion_order` (boolean, default: `true`): Controls whether DuckDB preserves the insertion order of rows in tables. When set to `true`, rows are returned in the order they were inserted. See [DuckDB preserve insertion order documentation](https://duckdb.org/docs/stable/guides/performance/how_to_tune_workloads#the-preserve_insertion_order-option) and [order preservation documentation](https://duckdb.org/docs/stable/sql/dialect/order_preservation).
4141
- `connection_pool_size` (integer, default: `10` or the number of datasets sharing the same DuckDB file, whichever is larger): Controls the maximum number of connections to keep open in the connection pool for concurrent query execution.
42-
- `on_refresh_recompute_statistics` (string, default: `enabled`): Triggers automatic `ANALYZE` execution after data refreshes. This keeps DuckDB optimizer statistics up-to-date for efficient query plans and performance. Set to `disabled` to turn automatic statistics recomputation off. See [DuckDB ANALYZE statement documentation](https://duckdb.org/docs/stable/sql/statements/analyze).
42+
- `on_refresh_recompute_statistics` (string, default: `enabled`, `disabled` when `refresh_mode` is `changes`): Triggers automatic `ANALYZE` execution after data refreshes. This keeps DuckDB optimizer statistics up-to-date for efficient query plans and performance. Set to `disabled` to turn automatic statistics recomputation off. See [DuckDB ANALYZE statement documentation](https://duckdb.org/docs/stable/sql/statements/analyze).
4343
- `partition_mode` (string, default: `files`): Controls how partitioned data is stored. Can only be used with `partition_by`. Set to `tables` to store partitions as separate tables within a single DuckDB database, improving resource usage through single shared connection pool for all partitions. Default `files` mode creates separate database files per partition with individual connection pools and generally faster query performance.
4444
- `duckdb_partitioned_write_flush_threshold_rows` (integer, default: `122880`): The number of rows buffered per partition before flushing data to acceleration storage. Only applicable when using `partition_mode: tables`. Using a larger value can improve write performance but requires more memory.
4545
- `on_refresh_sort_columns` (string, default: none): Sorts data after each refresh by the specified columns, improving DuckDB [zone map](https://duckdb.org/2025/05/14/sorting-for-fast-selective-queries) (min/max) statistics for query pruning and significantly faster lookup queries. Format: `column1 ASC, column2 DESC` or `column1, column2` (defaults to ASC). Specified columns must exist in the dataset schema, and sort direction must be `ASC` or `DESC`.

website/versioned_docs/version-1.9.x/components/data-accelerators/duckdb.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ DuckDB acceleration supports the following optional parameters under `accelerati
3939
- `duckdb_memory_limit` (string, default: none): Limits DuckDB's memory usage for instance. Acceptable units are KB, MB, GB, TB (decimal: 1000^i) or KiB, MiB, GiB, TiB (binary: 1024^i). See [DuckDB memory limit documentation](https://duckdb.org/docs/stable/configuration/overview).
4040
- `duckdb_preserve_insertion_order` (boolean, default: `true`): Controls whether DuckDB preserves the insertion order of rows in tables. When set to `true`, rows are returned in the order they were inserted. See [DuckDB preserve insertion order documentation](https://duckdb.org/docs/stable/guides/performance/how_to_tune_workloads#the-preserve_insertion_order-option) and [order preservation documentation](https://duckdb.org/docs/stable/sql/dialect/order_preservation).
4141
- `connection_pool_size` (integer, default: `10` or the number of datasets sharing the same DuckDB file, whichever is larger): Controls the maximum number of connections to keep open in the connection pool for concurrent query execution.
42-
- `on_refresh_recompute_statistics` (string, default: `enabled`): Triggers automatic `ANALYZE` execution after data refreshes. This keeps DuckDB optimizer statistics up-to-date for efficient query plans and performance. Set to `disabled` to turn automatic statistics recomputation off. See [DuckDB ANALYZE statement documentation](https://duckdb.org/docs/stable/sql/statements/analyze).
42+
- `on_refresh_recompute_statistics` (string, default: `enabled`, `disabled` when `refresh_mode` is `changes`): Triggers automatic `ANALYZE` execution after data refreshes. This keeps DuckDB optimizer statistics up-to-date for efficient query plans and performance. Set to `disabled` to turn automatic statistics recomputation off. See [DuckDB ANALYZE statement documentation](https://duckdb.org/docs/stable/sql/statements/analyze).
4343
- `partition_mode` (string, default: `files`): Controls how partitioned data is stored. Can only be used with `partition_by`. Set to `tables` to store partitions as separate tables within a single DuckDB database, improving resource usage through single shared connection pool for all partitions. Default `files` mode creates separate database files per partition with individual connection pools and generally faster query performance.
4444
- `duckdb_partitioned_write_flush_threshold_rows` (integer, default: `122880`): The number of rows buffered per partition before flushing data to acceleration storage. Only applicable when using `partition_mode: tables`. Using a larger value can improve write performance but requires more memory.
4545

0 commit comments

Comments
 (0)