Skip to content

Conversation

@hdikeman
Copy link
Contributor

@hdikeman hdikeman commented Jan 27, 2026

Summary:
There are usecases for which callers may want to extract some information from a query without needing to resolve all the metadata details required to build a full logical plan. An example could be a client-side check decides where to send a query based on the tables it accesses, or moving ACL checks earlier in query execution by determining accessed tables immediately

See #789 for the related issue

To enable this, I am adding two APIs to the PrestoParser, one which extracts accessed input tables, and one which extracts output tables, if any exist

There are two parts to this changeset:

  1. on recommendation of Masha, defined a DefaultTraversalVisitor, which performs a DFS traversal over all nodes in the AST. I used this baseclass for the existing ExprAnalyzer and the new TableVisitor. I can pull this into a separate PR if desired
  2. add the TableVisitor, which extracts input tables and the output table for the query, and link it into a new PrestoParser API to extract referenced tables

Looking for feedback: I implemented the handlers for query types not currently covered by the parser (materialized view statements, some view statements, pure CREATE TABLE), but these cannot be run yet. I can also remove them or leave more comments in PrestoParser.cpp

I am also looking for comments on structuring: PrestoParser.cpp is getting big, I can cut it up into a few source/header files in this diff or a follow-up if others agree (but did not want to do so without discussion)

Differential Revision: D91525572

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 27, 2026
@meta-codesync
Copy link

meta-codesync bot commented Jan 27, 2026

@hdikeman has exported this pull request. If you are a Meta employee, you can view the originating Diff in D91525572.

hdikeman added a commit to hdikeman/verax that referenced this pull request Jan 27, 2026
…bookincubator#804)

Summary:

There are usecases for which callers may want to extract some information from a query without needing to resolve all the metadata details required to build a full logical plan. An example could be a client-side check decides where to send a query based on the tables it accesses, or moving ACL checks earlier in query execution by determining accessed tables immediately

See facebookincubator#789 for the related issue

To enable this, I am adding two APIs to the PrestoParser, one which extracts accessed input tables, and one which extracts output tables, if any exist

There are two parts to this changeset:

1. on recommendation of Masha, defined a DefaultTraversalVisitor, which performs a DFS traversal over all nodes in the AST. I used this baseclass for the existing ExprAnalyzer and the new TableVisitor. I can pull this into a separate PR if desired
2. add the TableVisitor, which extracts input tables and the output table for the query, and link it into two new PrestoParser APIs for input and output tables respectively

Some things I was unsure about and would like feedback:

1. I exposed two APIs, but I could easily have exposed one (getInputAndOutputTables) and return a struct containing the output of both APIs
2. I implemented the handlers for query types not currently covered by the parser (materialized view statements, some view statements, pure CREATE TABLE), but these cannot be run yet. I can also remove them or leave more comments in PrestoParser.cpp

I am also looking for comments on structuring: PrestoParser.cpp is getting big, I can cut it up into a few source/header files in this diff or a follow-up if others agree (but did not want to do so without discussion)

Differential Revision: D91525572
@hdikeman hdikeman requested a review from mbasmanova January 27, 2026 04:23
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hdikeman Looks great % a few nits.

@mbasmanova
Copy link
Contributor

I implemented the handlers for query types not currently covered by the parser (materialized view statements, some view statements, pure CREATE TABLE), but these cannot be run yet.

I think this is fine. Thank you for calling this out. Makes review much easier.

BTW, wondering if we can mention somewhere that inputs extracted from the parser are a superset of the tables that will be accessed when the query runs. Some table accesses may be eliminated by the optimizer. E.g.: SELECT count(1) FROM t WHERE false.

@hdikeman
Copy link
Contributor Author

I implemented the handlers for query types not currently covered by the parser (materialized view statements, some view statements, pure CREATE TABLE), but these cannot be run yet.

I think this is fine. Thank you for calling this out. Makes review much easier.

Thank you for clarifying

BTW, wondering if we can mention somewhere that inputs extracted from the parser are a superset of the tables that will be accessed when the query runs. Some table accesses may be eliminated by the optimizer. E.g.: SELECT count(1) FROM t WHERE false.

That is a good point. Let me add a comment to that effect in the function annotation. Thanks

hdikeman added a commit to hdikeman/verax that referenced this pull request Jan 27, 2026
…alVisitor

Summary:
This refactor was originally done as part of facebookincubator#804, but I am pulling out into a separate PR here

Extracting common AST traversal logic into a common parent class which can be overridden by implementations which want to traverse the entire AST but only handle a specific subset of nodes.

Differential Revision: D91607843
hdikeman added a commit to hdikeman/verax that referenced this pull request Jan 27, 2026
…bookincubator#804)

Summary:

There are usecases for which callers may want to extract some information from a query without needing to resolve all the metadata details required to build a full logical plan. An example could be a client-side check decides where to send a query based on the tables it accesses, or moving ACL checks earlier in query execution by determining accessed tables immediately

See facebookincubator#789 for the related issue

To enable this, I am adding two APIs to the PrestoParser, one which extracts accessed input tables, and one which extracts output tables, if any exist

There are two parts to this changeset:

1. on recommendation of Masha, defined a DefaultTraversalVisitor, which performs a DFS traversal over all nodes in the AST. I used this baseclass for the existing ExprAnalyzer and the new TableVisitor. I can pull this into a separate PR if desired
2. add the TableVisitor, which extracts input tables and the output table for the query, and link it into two new PrestoParser APIs for input and output tables respectively

Some things I was unsure about and would like feedback:

1. I exposed two APIs, but I could easily have exposed one (getInputAndOutputTables) and return a struct containing the output of both APIs
2. I implemented the handlers for query types not currently covered by the parser (materialized view statements, some view statements, pure CREATE TABLE), but these cannot be run yet. I can also remove them or leave more comments in PrestoParser.cpp

I am also looking for comments on structuring: PrestoParser.cpp is getting big, I can cut it up into a few source/header files in this diff or a follow-up if others agree (but did not want to do so without discussion)

Reviewed By: mbasmanova

Differential Revision: D91525572
hdikeman added a commit to hdikeman/verax that referenced this pull request Jan 27, 2026
…alVisitor (facebookincubator#807)

Summary:

This refactor was originally done as part of facebookincubator#804, but I am pulling out into a separate PR here

Extracting common AST traversal logic into a common parent class which can be overridden by implementations which want to traverse the entire AST but only handle a specific subset of nodes.

Reviewed By: mbasmanova

Differential Revision: D91607843
hdikeman added a commit to hdikeman/verax that referenced this pull request Jan 27, 2026
…bookincubator#804)

Summary:

There are usecases for which callers may want to extract some information from a query without needing to resolve all the metadata details required to build a full logical plan. An example could be a client-side check decides where to send a query based on the tables it accesses, or moving ACL checks earlier in query execution by determining accessed tables immediately

See facebookincubator#789 for the related issue

To enable this, I am adding two APIs to the PrestoParser, one which extracts accessed input tables, and one which extracts output tables, if any exist

There are two parts to this changeset:

1. on recommendation of Masha, defined a DefaultTraversalVisitor, which performs a DFS traversal over all nodes in the AST. I used this baseclass for the existing ExprAnalyzer and the new TableVisitor. I can pull this into a separate PR if desired
2. add the TableVisitor, which extracts input tables and the output table for the query, and link it into two new PrestoParser APIs for input and output tables respectively

Some things I was unsure about and would like feedback:

1. I exposed two APIs, but I could easily have exposed one (getInputAndOutputTables) and return a struct containing the output of both APIs
2. I implemented the handlers for query types not currently covered by the parser (materialized view statements, some view statements, pure CREATE TABLE), but these cannot be run yet. I can also remove them or leave more comments in PrestoParser.cpp

I am also looking for comments on structuring: PrestoParser.cpp is getting big, I can cut it up into a few source/header files in this diff or a follow-up if others agree (but did not want to do so without discussion)

Reviewed By: mbasmanova

Differential Revision: D91525572
…alVisitor (facebookincubator#807)

Summary:

This refactor was originally done as part of facebookincubator#804, but I am pulling out into a separate PR here

Extracting common AST traversal logic into a common parent class which can be overridden by implementations which want to traverse the entire AST but only handle a specific subset of nodes.

Reviewed By: mbasmanova

Differential Revision: D91607843
…bookincubator#804)

Summary:

There are usecases for which callers may want to extract some information from a query without needing to resolve all the metadata details required to build a full logical plan. An example could be a client-side check decides where to send a query based on the tables it accesses, or moving ACL checks earlier in query execution by determining accessed tables immediately

See facebookincubator#789 for the related issue

To enable this, I am adding two APIs to the PrestoParser, one which extracts accessed input tables, and one which extracts output tables, if any exist

There are two parts to this changeset:

1. on recommendation of Masha, defined a DefaultTraversalVisitor, which performs a DFS traversal over all nodes in the AST. I used this baseclass for the existing ExprAnalyzer and the new TableVisitor. I can pull this into a separate PR if desired
2. add the TableVisitor, which extracts input tables and the output table for the query, and link it into two new PrestoParser APIs for input and output tables respectively

Some things I was unsure about and would like feedback:

1. I exposed two APIs, but I could easily have exposed one (getInputAndOutputTables) and return a struct containing the output of both APIs
2. I implemented the handlers for query types not currently covered by the parser (materialized view statements, some view statements, pure CREATE TABLE), but these cannot be run yet. I can also remove them or leave more comments in PrestoParser.cpp

I am also looking for comments on structuring: PrestoParser.cpp is getting big, I can cut it up into a few source/header files in this diff or a follow-up if others agree (but did not want to do so without discussion)

Reviewed By: mbasmanova

Differential Revision: D91525572
hdikeman added a commit to hdikeman/verax that referenced this pull request Jan 27, 2026
…alVisitor (facebookincubator#807)

Summary:

This refactor was originally done as part of facebookincubator#804, but I am pulling out into a separate PR here

Extracting common AST traversal logic into a common parent class which can be overridden by implementations which want to traverse the entire AST but only handle a specific subset of nodes.

Reviewed By: mbasmanova

Differential Revision: D91607843
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants