Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This modifies the
GetFilesForTable
andDuckLakeMultiFileList
to first generate CTEs for each of the specific column stats.This does not change the actual filters that are being run, it only changes the shape of the query. It includes logic to explicitly
materialize
the CTE if there are multiple references, and explicitly saysnot materialized
when it is only 1 rather than leaving it up to the optimizer. Since it is currently run on a single TableFilter the reference count is always 1 and they are never marked as materialized.Part of the continuation of breaking up #477 into more reviewable chunks as suggested
This makes a material difference on query time only when the same column is referenced multiple times, as can happen with complex filters (which are not yet processed) because in ducklakes with lots of table with lots of columns a very significant portion of the total time is just looking up the column stats, not comparing them, so doing that multiple times really hurts query execution time.
Sample user query:
Sample generated query: