-
Notifications
You must be signed in to change notification settings - Fork 475
fix(datafusion): optimize partition pruning and predicate pushdown #3377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
Please note this only deals with |
f9c3035
to
d58632b
Compare
Added some tests and fixes for the decision around Exact/Inexact predicate pushdown. I was being overly optimistic initially, I took some inspiration from the datafusion ListingTable provider, but simplified it to be better aligned with the PruningPredicate. I am still planning to add a few tests similar to Would appreciate any feedback on the couple of TODOs I left around as comments. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3377 +/- ##
==========================================
+ Coverage 71.97% 72.01% +0.04%
==========================================
Files 145 145
Lines 45774 45864 +90
Branches 45774 45864 +90
==========================================
+ Hits 32944 33028 +84
- Misses 10746 10752 +6
Partials 2084 2084 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ceed00f
to
7ea7975
Compare
Rebased and squashed, as nobody started a review yet. |
Realized over the weekend that I was breaking a DF optimization that can resolve queries such as Added another commit that adds a temporary fix for pruning stats in addition to files, waiting for feedback on implementation. |
@rtyler thanks for taking a look 🙌 I just added a commit: only push the Still left a handful of FIXME and TODOs around the code, waiting for your initial reaction before polishing the rest, I wanted to minimize the diff/rebase effort for now. |
47cc99e
to
1eca9af
Compare
The code generally looks good to me. I would recommend get this into whatever state you feel comfortable merging in |
…cols filters Signed-off-by: Adrian Tanase <[email protected]>
… for DF integration Signed-off-by: Adrian Tanase <[email protected]>
1eca9af
to
9cfd71b
Compare
@rtyler rebased and squashed. LMK what other areas I should look at |
Description
This PR adds 3 optimizations for the Datafusion integration, that go well together.
I am open to split in 2 PRs if you think it's necessary, but the 2nd one would depend on the first one anyway
In the first commit:
contained
method which used to be a no-opIn the 2nd commit:
Related Issue(s)
LIMIT
push down during query planning to reduce number of files scanned delta#1495