-
Notifications
You must be signed in to change notification settings - Fork 57
feat(parser): Add input/output table extraction to PrestoParser #804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…bookincubator#804) Summary: There are usecases for which callers may want to extract some information from a query without needing to resolve all the metadata details required to build a full logical plan. An example could be a client-side check decides where to send a query based on the tables it accesses, or moving ACL checks earlier in query execution by determining accessed tables immediately See facebookincubator#789 for the related issue To enable this, I am adding two APIs to the PrestoParser, one which extracts accessed input tables, and one which extracts output tables, if any exist There are two parts to this changeset: 1. on recommendation of Masha, defined a DefaultTraversalVisitor, which performs a DFS traversal over all nodes in the AST. I used this baseclass for the existing ExprAnalyzer and the new TableVisitor. I can pull this into a separate PR if desired 2. add the TableVisitor, which extracts input tables and the output table for the query, and link it into two new PrestoParser APIs for input and output tables respectively Some things I was unsure about and would like feedback: 1. I exposed two APIs, but I could easily have exposed one (getInputAndOutputTables) and return a struct containing the output of both APIs 2. I implemented the handlers for query types not currently covered by the parser (materialized view statements, some view statements, pure CREATE TABLE), but these cannot be run yet. I can also remove them or leave more comments in PrestoParser.cpp I am also looking for comments on structuring: PrestoParser.cpp is getting big, I can cut it up into a few source/header files in this diff or a follow-up if others agree (but did not want to do so without discussion) Differential Revision: D91525572
031664a to
befd458
Compare
mbasmanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hdikeman Looks great % a few nits.
I think this is fine. Thank you for calling this out. Makes review much easier. BTW, wondering if we can mention somewhere that inputs extracted from the parser are a superset of the tables that will be accessed when the query runs. Some table accesses may be eliminated by the optimizer. E.g.: SELECT count(1) FROM t WHERE false. |
Thank you for clarifying
That is a good point. Let me add a comment to that effect in the function annotation. Thanks |
…alVisitor Summary: This refactor was originally done as part of facebookincubator#804, but I am pulling out into a separate PR here Extracting common AST traversal logic into a common parent class which can be overridden by implementations which want to traverse the entire AST but only handle a specific subset of nodes. Differential Revision: D91607843
…bookincubator#804) Summary: There are usecases for which callers may want to extract some information from a query without needing to resolve all the metadata details required to build a full logical plan. An example could be a client-side check decides where to send a query based on the tables it accesses, or moving ACL checks earlier in query execution by determining accessed tables immediately See facebookincubator#789 for the related issue To enable this, I am adding two APIs to the PrestoParser, one which extracts accessed input tables, and one which extracts output tables, if any exist There are two parts to this changeset: 1. on recommendation of Masha, defined a DefaultTraversalVisitor, which performs a DFS traversal over all nodes in the AST. I used this baseclass for the existing ExprAnalyzer and the new TableVisitor. I can pull this into a separate PR if desired 2. add the TableVisitor, which extracts input tables and the output table for the query, and link it into two new PrestoParser APIs for input and output tables respectively Some things I was unsure about and would like feedback: 1. I exposed two APIs, but I could easily have exposed one (getInputAndOutputTables) and return a struct containing the output of both APIs 2. I implemented the handlers for query types not currently covered by the parser (materialized view statements, some view statements, pure CREATE TABLE), but these cannot be run yet. I can also remove them or leave more comments in PrestoParser.cpp I am also looking for comments on structuring: PrestoParser.cpp is getting big, I can cut it up into a few source/header files in this diff or a follow-up if others agree (but did not want to do so without discussion) Reviewed By: mbasmanova Differential Revision: D91525572
befd458 to
1df78e5
Compare
…alVisitor (facebookincubator#807) Summary: This refactor was originally done as part of facebookincubator#804, but I am pulling out into a separate PR here Extracting common AST traversal logic into a common parent class which can be overridden by implementations which want to traverse the entire AST but only handle a specific subset of nodes. Reviewed By: mbasmanova Differential Revision: D91607843
…bookincubator#804) Summary: There are usecases for which callers may want to extract some information from a query without needing to resolve all the metadata details required to build a full logical plan. An example could be a client-side check decides where to send a query based on the tables it accesses, or moving ACL checks earlier in query execution by determining accessed tables immediately See facebookincubator#789 for the related issue To enable this, I am adding two APIs to the PrestoParser, one which extracts accessed input tables, and one which extracts output tables, if any exist There are two parts to this changeset: 1. on recommendation of Masha, defined a DefaultTraversalVisitor, which performs a DFS traversal over all nodes in the AST. I used this baseclass for the existing ExprAnalyzer and the new TableVisitor. I can pull this into a separate PR if desired 2. add the TableVisitor, which extracts input tables and the output table for the query, and link it into two new PrestoParser APIs for input and output tables respectively Some things I was unsure about and would like feedback: 1. I exposed two APIs, but I could easily have exposed one (getInputAndOutputTables) and return a struct containing the output of both APIs 2. I implemented the handlers for query types not currently covered by the parser (materialized view statements, some view statements, pure CREATE TABLE), but these cannot be run yet. I can also remove them or leave more comments in PrestoParser.cpp I am also looking for comments on structuring: PrestoParser.cpp is getting big, I can cut it up into a few source/header files in this diff or a follow-up if others agree (but did not want to do so without discussion) Reviewed By: mbasmanova Differential Revision: D91525572
1df78e5 to
638e742
Compare
…alVisitor (facebookincubator#807) Summary: This refactor was originally done as part of facebookincubator#804, but I am pulling out into a separate PR here Extracting common AST traversal logic into a common parent class which can be overridden by implementations which want to traverse the entire AST but only handle a specific subset of nodes. Reviewed By: mbasmanova Differential Revision: D91607843
…bookincubator#804) Summary: There are usecases for which callers may want to extract some information from a query without needing to resolve all the metadata details required to build a full logical plan. An example could be a client-side check decides where to send a query based on the tables it accesses, or moving ACL checks earlier in query execution by determining accessed tables immediately See facebookincubator#789 for the related issue To enable this, I am adding two APIs to the PrestoParser, one which extracts accessed input tables, and one which extracts output tables, if any exist There are two parts to this changeset: 1. on recommendation of Masha, defined a DefaultTraversalVisitor, which performs a DFS traversal over all nodes in the AST. I used this baseclass for the existing ExprAnalyzer and the new TableVisitor. I can pull this into a separate PR if desired 2. add the TableVisitor, which extracts input tables and the output table for the query, and link it into two new PrestoParser APIs for input and output tables respectively Some things I was unsure about and would like feedback: 1. I exposed two APIs, but I could easily have exposed one (getInputAndOutputTables) and return a struct containing the output of both APIs 2. I implemented the handlers for query types not currently covered by the parser (materialized view statements, some view statements, pure CREATE TABLE), but these cannot be run yet. I can also remove them or leave more comments in PrestoParser.cpp I am also looking for comments on structuring: PrestoParser.cpp is getting big, I can cut it up into a few source/header files in this diff or a follow-up if others agree (but did not want to do so without discussion) Reviewed By: mbasmanova Differential Revision: D91525572
638e742 to
2a8276a
Compare
…alVisitor (facebookincubator#807) Summary: This refactor was originally done as part of facebookincubator#804, but I am pulling out into a separate PR here Extracting common AST traversal logic into a common parent class which can be overridden by implementations which want to traverse the entire AST but only handle a specific subset of nodes. Reviewed By: mbasmanova Differential Revision: D91607843
Summary:
There are usecases for which callers may want to extract some information from a query without needing to resolve all the metadata details required to build a full logical plan. An example could be a client-side check decides where to send a query based on the tables it accesses, or moving ACL checks earlier in query execution by determining accessed tables immediately
See #789 for the related issue
To enable this, I am adding two APIs to the PrestoParser, one which extracts accessed input tables, and one which extracts output tables, if any exist
There are two parts to this changeset:
Looking for feedback: I implemented the handlers for query types not currently covered by the parser (materialized view statements, some view statements, pure CREATE TABLE), but these cannot be run yet. I can also remove them or leave more comments in PrestoParser.cpp
I am also looking for comments on structuring: PrestoParser.cpp is getting big, I can cut it up into a few source/header files in this diff or a follow-up if others agree (but did not want to do so without discussion)
Differential Revision: D91525572