-
Couldn't load subscription status.
- Fork 537
perf: support pushing physical filters down through DeltaScan #3859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: support pushing physical filters down through DeltaScan #3859
Conversation
77b8ccd to
8ae819f
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3859 +/- ##
==========================================
- Coverage 73.77% 73.76% -0.02%
==========================================
Files 151 151
Lines 39176 39165 -11
Branches 39176 39165 -11
==========================================
- Hits 28903 28890 -13
- Misses 9001 9004 +3
+ Partials 1272 1271 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking b/c I doubt the PR, just for egoistic reasons to learn 😆.
IIUC, this sees filters that come from the parent node. and decides if it can push these down. We also have on the TableProvider supports_filters... o.a. SO my guess is that the planner will call this on the provider which will ultimately then deter mine what gets passed into the function we do have here, right?
I am currently rewriting the table provider to since the current one has some organic growth artifacts and a refactor to handle column mapping etc. seems much ore work. SO just trying to understand how these relate and where we want to end up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking b/c I doubt the PR, just for egoistic reasons to learn 😆.
IIUC, this sees filters that come from the parent node. and decides if it can push these down. We also have on the TableProvider supports_filters... o.a. SO my guess is that the planner will call this on the provider which will ultimately then deter mine what gets passed into the function we do have here, right?
I am currently rewriting the table provider to since the current one has some organic growth artifacts and a refactor to handle column mapping etc. seems much ore work. SO just trying to understand how these relate and where we want to end up.
|
My understanding is the current These new functions, Implementing |
Signed-off-by: Alex Wilcoxson <[email protected]>
Head branch was pushed to by a user without write access
8ae819f to
e83f4a8
Compare
|
thanks @roeap I rebased again, if you need to enable the merge again |
Description
This change enables physical expr filter pushdown through the DeltaScan ExecutionPlan impl. gather_filters_for_pushdown by default assume that no filters can be pushed down, but since DeltaScan is a wrap around the Parquet data source exec, we can push the filters to that.
This is important to leverage dynamic filter pushdown for hash join for example.
To verify these changes I did the following:
Results (these are the metrics from the parquet scan of the big table):
Without pushdown
With pushdown
Notice the output rows was reduced from 100 million to ~6 million and the cumulative time spent scanning from 101s to 5s. You will also see the various statistics and pruning metrics being nonzero indicating it was able to leverage the dynamic filter built out of the left hand side of the hash join.
Related Issue(s)
n/a
Documentation
Docs for the method implemented, gather_filters_for_pushdown
Also, the associated method, handle_child_pushdown_result
I am leaving that as the default impl