-
Notifications
You must be signed in to change notification settings - Fork 3.9k
feat(engine): ahead of time catalog lookups #20189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ivkalita
wants to merge
1
commit into
main
Choose a base branch
from
ivkalita/aot-catalog-lookups
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feel a bit strange to me that we do physical planning twice, once to gather fake requests and once with the real data. Did you do it this way to maintain a separation of concerns or some other reason?
Physical planning is reasonably cheap without the metastore calls now but it might change in the future if we build different physical plans depending on the output of the catalog requests, or even issue new requests based off the old ones. Does it make sense to instead implement a catalog which will resolve requests via distributing metaqueries directly instead of doing this 2 step process?
That said, if this is just a stepping stone to the full implementation I'm happy to approve it to keep things moving!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to give the engine full control of the (meta)query execution
At the moment I think there is no difference in how we collect / execute metaqueries (given there's at most 1 catalog request per physical plan, right?), I can simplify the approach to delegate the work to a new catalog.
Yes! Current approach doesn't work in this case, so I was thinking about a loop like
I'm still slightly hesitant about executing metaqueries one by one in the order that depends on the physical planner walk just because we won't have enough context to optimize these queries (parallelize / decouple).
Wdyt about all above?
I'm considering simplifying this part for now (applying "implement a catalog which will resolve requests via distributing metaqueries directly" suggestion) and just keeping a comment that we need to revisit this part when we have multiple metaqueries per physical plan.