When are ducklake_file_column_statistics.min_value
/max_value
used for for file pruning?
#82
-
I setup a catalog using postgres and SELECT name FROM my_table WHERE id = 1995; I expected to see a query of some sort against But instead I saw: ...
COPY (SELECT "table_id", "column_id", "contains_null", "contains_nan", "min_value", "max_value" FROM "public"."ducklake_table_column_stats" WHERE ctid BETWEEN '(0,0)'::tid AND '(4294967295,0)'::tid) TO STDOUT (FORMAT "binary");
COPY (SELECT "table_id", "record_count", "file_size_bytes", "next_row_id" FROM "public"."ducklake_table_stats" ) TO STDOUT (FORMAT "binary");
COPY (SELECT "table_id", "column_id", "max_value", "min_value", "data_file_id" FROM "public"."ducklake_file_column_statistics" WHERE ctid BETWEEN '(0,0)'::tid AND '(4294967295,0)'::tid AND "table_id" = '1' AND "column_id" = '1') TO STDOUT (FORMAT "binary");
... I'm curious why; is it because |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
There's no such cut-off. DuckLake runs queries using DuckDB itself and doesn't directly run queries in the metadata server, i.e. it relies on If you want to view the queries that DuckLake executes itself, you can use the following logging: PRAGMA enable_logging('QueryLog');
SET logging_storage='stdout'; |
Beta Was this translation helpful? Give feedback.
-
Thanks, the |
Beta Was this translation helpful? Give feedback.
There's no such cut-off. DuckLake runs queries using DuckDB itself and doesn't directly run queries in the metadata server, i.e. it relies on
duckdb-postgres
to talk to Postgres. Most likely themin_value/max_value
filter is not pushed down into the Postgres scan (yet) which means this is executed only DuckDB side currently.If you want to view the queries that DuckLake executes itself, you can use the following logging: